What you’ll measure
Test the bundle a user receives for a$4.99 starter pack SKU (iap_starter_pack_4_99). The price is fixed; the contents vary:
| Variant | Gems given | Bonus content | Hypothesis |
|---|---|---|---|
| Control | 500 gems | + 1 character skin | Baseline. |
| Variant A | 750 gems (+50%) | + 1 skin | Better gems-per-dollar lifts conversion. |
| Variant B | 500 gems | + 2 skins (+1 from control) | Cosmetic-driven players convert better with bonus content. |
Pre-flight check
Confirm that bundle composition is served from a backend config endpoint and not bundled into the app binary. Open the Fleack backoffice, navigate to Endpoints, and look for your shop config endpoint. Check the body sample for paths like:data.shop.bundles[?id=starter_pack].gemsdata.shop.bundles[?id=starter_pack].bonus_itemsdata.iap.starter_pack.contents
POST /api/iap/validate (the endpoint your app calls after StoreKit or Play Billing receipt verification) — as the purchase signal. If that call doesn’t exist or doesn’t pass through Fleack’s proxy, the conversion metric won’t fire.
Main workflow
Declare the levers
You need two levers, both pointing into the same shop endpoint.Lever 1 — Gems per starter packClick + New lever in the Levers page:
- Pick the shop config endpoint.
- Search for
gemsin the path picker and selectdata.shop.bundles[?id=starter_pack].gems. - Set Label:
Starter pack gems, Type:number, Test suggestions:500, 750, 1000.
- Same endpoint.
- Select
data.shop.bundles[?id=starter_pack].bonus_items. - Set Label:
Starter pack bonus skins, Type:number, Test suggestions:1, 2, 3.
The
[?id=starter_pack] filter syntax in the path picker matches the correct object inside an array of bundle configs. Fleack rewrites only that item and leaves the rest of the array untouched.Set up the tests
Run two separate tests, not one combined test that varies both levers at once. Combined tests inflate variance, slow down decisions, and make it impossible to attribute the result to a single change.Gems test:
- Variants: 500 (control), 750, 1000
- Allocation: 33% / 33% / 34%
- Segment:
days_since_install ≥ 1ANDhas_purchased = false— you’re measuring first-purchase conversion, not upsell behaviour. - Primary metric: Conversion on
POST /api/iap/validate, conversion window 24 hours after exposure. - Secondary metric: Scalar delta on
total_revenue_usd, observation window 14 days.
bonus_items lever instead.Click Launch on both. They run independently, each with their own exposure counts and result panels.Watch the results
Pricing tests produce signal more slowly than engagement tests because purchase rates are low — typically 1–3% on a starter pack.Realistic timelines for a 200K DAU game:
- 1,000+ exposures per variant within a few hours of launch
- First conversion events within a day
- Statistically meaningful conversion read at 5,000–10,000 exposures per variant
- Decisive ARPPU verdict at 14 days
Make the call
Pricing tests have a specific failure mode: Variant A wins on conversion but loses on ARPPU. This means you sold the starter pack to users who would have bought a more expensive bundle later — you traded future revenue for a short-term conversion bump.Use this decision rule:
If both tests declare a winner, promote each independently — the two levers are orthogonal.
| Verdict | Condition | Action |
|---|---|---|
| Promote | Variant wins on both primary AND secondary: ≥ 90% win probability on conversion AND ≥ 5% ARPPU uplift | Click Promote |
| Reject | Variant wins on conversion but ARPPU delta is flat or negative | Stop the test, keep control |
| Run longer | Mixed verdict at 14 days | Extend to 21–28 days for the LTV signal to stabilise |
Common pitfalls
- Don’t test below your floor price. Some regions and currencies have minimum IAP tier amounts enforced by the platform. Test parameters around the price (bundle contents, bonus items) — never the SKU price tier itself.
- Watch for cohort drift. Long-running pricing tests (21+ days) can be confounded by seasonal spend patterns. Compare users by
days_since_installbracket, not by calendar date, to keep cohorts equivalent. - Don’t cross-test cosmetics with currency. Cosmetics (skins, avatars) and soft currency (gems, coins) behave very differently — cosmetics are upgrades driven by identity, currency is staples driven by utility. Run them as separate tests on separate user cohorts so the results stay interpretable.
A/B test interstitial ad frequency
Balance ad revenue and D7 retention by testing three interstitial cadences on a live game.
A/B test an onboarding flow
Reorder onboarding steps on new installs to lift D1 retention without a binary update.