MONETIZATION UTILITY · SHEET 08

Whale-Skewed A/B Test
Significance Grader.

Standard statistical calculators assume normally distributed metrics. For apps and games where spend is heavily skewed by a few high-paying "whales," use this calculator to estimate true statistical runtime requirements.

FILE · AB_TEST_08

Test Inputs

01 · Baseline IAP Conversion %
02 · Average Order Value (AOV) $
03 · Whale Skew (Pareto Shape) α=
04 · Minimum Detectable Lift (MDE) %
05 · Total Daily Traffic (DAU)

A/B Test Grader Outcomes

True Sample Size Required
0
Required sample size *per variation* adjusting for spend variance
Standard Calc Duration
0 Days
Underpowered runtime computed by naive calculators
Mean ARPU
$0.00
Baseline average revenue per active user
SAMPLE SIZE REQUIREMENT: THE UNDERPOWERED DANGER ZONE
SAMPLE SIZE TRAJECTORY: EFFECT OF WHALE SKEW CONCENTRATION

The mathematics of skewed experiments.

A/B testing is a tool for capital protection. Running experiments on whale-skewed ARPU metrics without adjusting your variance metrics guarantees false-positive data.

VARIANCE EXPLOSION

The Whale-Variance Trap

In standard t-tests, sample size scales with variance. Because a tiny fraction of players (whales) can spend $100+ while others spend $0, the variance is orders of magnitude larger than the mean. Standard calculators miss this entirely.

Statistical Friction: Whale Outliers
FALSE POSITIVES

Underpowered Noise Chase

If you run a test underpowered (e.g. stopping at 10,000 users when you need 80,000), a single random whale transaction in variation B will spike the mean ARPU. You will declare a "winner" that was purely random noise.

Statistical Friction: Early Halting
PARETO SHAPE

Pareto Alpha Mechanics

The Pareto Alpha parameter (α) defines how concentrated spend is. An alpha of 1.15 represents extreme whale concentration (common in deep-Gacha RPGs), requiring huge sample volumes. An alpha of 2.2 represents flat spend.

Statistical Friction: Pareto Shape
LIFT VELOCITY

The MDE Leverage Factor

Minimum Detectable Effect (MDE) is the scale of lift you care about. Detecting a minor 2% change requires a massive sample base. Scaling your target MDE to 10% reduces sample sizes by 25x, enabling faster runtimes.

Statistical Friction: MDE Scaling
DECISION-GRADE EXPERIMENTATION & ANALYTICS AUDIT

Establish clean, math-validated testing frameworks.

Product teams regularly burn millions of dollars shipping features that failed A/B significance parameters due to whale dilution noise. We audit analytics architectures, design variance-adjusted sample engines, map cohort testing schedules, and configure clean telemetry metrics.