A/B test an onboarding flow on mobile without rebuilding

A single 5-point improvement in D1 retention compounds through every downstream metric — D7 retention, ARPDAU, LTV — making onboarding the highest-leverage test you can run on a mobile game. Fleack can reorder and adjust the onboarding steps your app fetches from your backend at first launch, serving different sequences to different new-install cohorts without touching the binary. This recipe covers the full workflow: from identifying the config endpoint in the backoffice to promoting a winning sequence to 100% of new installs.

What you’ll measure

Test the order and pacing of the first three onboarding steps. A common pattern in mid-core games:

Variant	Step 1	Step 2	Step 3	Hypothesis
Control	Tutorial battle	Account binding prompt	Reward claim	Current sequence.
Variant A	Tutorial battle	Reward claim	Account binding prompt	Showing reward before friction lifts D1 retention.
Variant B	Reward claim	Tutorial battle	Account binding prompt	Frontloading the reward and deferring everything else maximises early delight.

Primary metric: D1 retention (binary) Secondary metric: Tutorial completion rate (binary — conversion on the tutorial_complete backend event within 1 hour of exposure)

Pre-flight check

Confirm your backend serves the onboarding sequence as a config endpoint, not as hardcoded logic in the binary. Open the Fleack backoffice, navigate to Endpoints, and find the endpoint your app calls at first launch. Check:

Classification: must read config-candidate. If it reads user-data, the response varies per user — check whether you’re looking at the right endpoint.
Body sample: the step array must appear. Typical paths:
- data.onboarding.steps (array of step IDs)
- data.tutorial.flow (array of { id, type, params } objects)
- config.first_session.steps

If the step array isn’t in the body sample, your client is hardcoding the order. You need a one-time backend change to move the sequence into a config endpoint before this test is possible.

Main workflow

Declare the lever

Open the Levers page. The AI enrichment pass may have already detected an onboarding steps lever — look for something labelled “Onboarding steps” or “Tutorial flow”. If it exists, click it, verify the path is correct, and proceed to step 2.If you need to create it manually, click + New lever:

Pick the onboarding config endpoint.
In the path picker, search for steps (or flow, depending on your path) and click the array path.
Set the lever details:
- Label: Onboarding step order
- Type: text — Fleack treats the JSON array as a string value so you can paste whole arrays as variant values.
- Test suggestions: paste the step arrays you plan to test, for example:
  - ["battle","account","reward"] — control sequence
  - ["battle","reward","account"] — Variant A
  - ["reward","battle","account"] — Variant B

Fleack writes the new array in place of the existing one when the test fires, leaving every other field in the response untouched.

Set up the test

From the lever detail page, click Test.

Variant A value: ["battle","reward","account"]
Variant B value: ["reward","battle","account"]
Allocation: 33% / 33% / 34%
Segment: this is critical — restrict to new installs only.
- Add rule: days_since_install eq 0
- Add rule: total_sessions lte 1
Without this segment, returning users may receive a “new” onboarding sequence — that’s a bug report, not an experiment result.
Primary metric: Retention day 1 — select the endpoint your app calls on session start (the same session-start endpoint used by all retention metrics).
Secondary metric: Conversion on POST /api/onboarding/complete (or your equivalent tutorial-complete event endpoint), conversion window 1 hour after exposure.

Click Launch.

Variant assignment is sticky per user. A new install sees the same step sequence on every request for the duration of the test — Fleack hashes test_id + user_identifier to guarantee consistency.

Watch the results

Onboarding tests have fast exposure accumulation because every new install triggers them. For a game receiving 1,000 new installs per day:

Hundreds of D1-eligible exposures per day per variant
A meaningful early read within 5–7 days
A stable verdict typically available at 14 days

The test detail page shows per-variant exposure counts, D1 retention rates (marked not yet eligible for users installed fewer than 1 day ago), tutorial completion conversion rates, and Bayesian win probabilities vs control.

Check the tutorial completion secondary metric early. A variant that dramatically suppresses completion rate — even if D1 retention looks flat — is a red flag. Players who skip the tutorial often churn by D7 regardless of how the D1 number looks.

Onboarding test verdicts are noisier than ad frequency tests because your eligible pool is limited to new installs, not your entire DAU. Don’t rush to a conclusion before 14 days — D1 retention has natural weekly variance because weekend cohorts behave differently from weekday cohorts.

Make the call

Verdict	Condition	Action
Promote	Variant wins D1 retention with ≥ 90% confidence AND tutorial completion is flat or improved	Click Promote
Reject	Variant lifts D1 but tutorial completion drops more than 5 percentage points	Stop the test, keep control
Run longer	No clear difference at 14 days	Extend to 21 days — deferred-friction variants sometimes only show their advantage at D7–D14

A “deferred friction” variant (e.g. Variant B, which frontloads the reward) may look flat or slightly negative at D1 but reveal a D7 advantage as players who received an early reward feel more invested. If you’re seeing marginal D1 results at 14 days, check whether D7 data is starting to separate before stopping.

Promote the winner

From the test detail page, click Promote on the winning variant. Fleack immediately serves that step sequence to 100% of new installs, on every shipped app version, with no binary update.

If your winning variant defers the account binding step, verify before promoting that you have a fallback in place:

A second account-binding nudge later in the funnel (e.g. on session 3 or when the user earns a significant reward).
A guest data model that persists progress until binding occurs.

If neither exists, the test has exposed a product gap. Fix the gap first — shipping a variant that permanently defers binding without a fallback will cause data loss on device switches.

Common pitfalls

Don’t test onboarding mid-soft-launch. Soft-launch cohorts are small and skewed toward a specific demographic or region. You’ll get noisy, non-generalisable results. Wait until you have reliable, representative daily installs before running onboarding experiments.
Don’t expose returning users to the test. The segment rules days_since_install = 0 AND total_sessions ≤ 1 are not optional. A returning user encountering a reshuffled onboarding is a bug, and the resulting confusion will pollute your retention numbers.
Watch for store-listing mismatch. If your App Store or Google Play screenshots promise a specific first-screen experience and your winning variant changes it, update the screenshots in your next release. Mismatched expectations between the listing and the actual experience hurt D1 more than any step reorder can lift.
Test one thing at a time. Reordering the existing steps is a clean, single-variable change. Don’t also vary step content, reward values, or tutorial skip availability in the same experiment — variant interaction effects will make the results unreadable.

A/B test interstitial ad frequency

Balance ad revenue and D7 retention by testing three interstitial cadences on a live game.

A/B test in-app pricing

Test IAP bundle composition for a fixed-price SKU to lift first-purchase conversion and ARPPU.

​What you’ll measure

​Pre-flight check

​Main workflow

​Common pitfalls

A/B test interstitial ad frequency

A/B test in-app pricing

What you’ll measure

Pre-flight check

Main workflow

Common pitfalls