Whale-Skewed A/B Test
Significance Grader.
Standard statistical calculators assume normally distributed metrics. For apps and games where spend is heavily skewed by a few high-paying "whales," use this calculator to estimate true statistical runtime requirements.
Test Inputs
A/B Test Grader Outcomes
The mathematics of skewed experiments.
A/B testing is a tool for capital protection. Running experiments on whale-skewed ARPU metrics without adjusting your variance metrics guarantees false-positive data.
The Whale-Variance Trap
In standard t-tests, sample size scales with variance. Because a tiny fraction of players (whales) can spend $100+ while others spend $0, the variance is orders of magnitude larger than the mean. Standard calculators miss this entirely.
Underpowered Noise Chase
If you run a test underpowered (e.g. stopping at 10,000 users when you need 80,000), a single random whale transaction in variation B will spike the mean ARPU. You will declare a "winner" that was purely random noise.
Pareto Alpha Mechanics
The Pareto Alpha parameter (α) defines how concentrated spend is. An alpha of 1.15 represents extreme whale concentration (common in deep-Gacha RPGs), requiring huge sample volumes. An alpha of 2.2 represents flat spend.
The MDE Leverage Factor
Minimum Detectable Effect (MDE) is the scale of lift you care about. Detecting a minor 2% change requires a massive sample base. Scaling your target MDE to 10% reduces sample sizes by 25x, enabling faster runtimes.
Establish clean, math-validated testing frameworks.
Product teams regularly burn millions of dollars shipping features that failed A/B significance parameters due to whale dilution noise. We audit analytics architectures, design variance-adjusted sample engines, map cohort testing schedules, and configure clean telemetry metrics.