MONETIZATION UTILITY · SHEET 08

Whale-Skewed A/B Test
Significance Grader.

Standard statistical calculators assume normally distributed metrics. For apps and games where spend is heavily skewed by a few high-paying "whales," use this calculator to estimate true statistical runtime requirements.

FILE · AB_TEST_08

Test Inputs

01 · Baseline IAP Conversion %

02 · Average Order Value (AOV) $

03 · Whale Skew (Pareto Shape) α=

04 · Minimum Detectable Lift (MDE) %

05 · Total Daily Traffic (DAU)

06 · Statistical Power (1 - β)

07 · Significance Level (α)

True Test Duration

0 Days

Required test duration to safely detect IAP lift under whale skew

True Sample Size Required

Required sample size *per variation* adjusting for spend variance

Standard Calc Duration

0 Days

Underpowered runtime computed by naive calculators

Mean ARPU

$0.00

Baseline average revenue per active user

SAMPLE SIZE REQUIREMENT: THE UNDERPOWERED DANGER ZONE

SAMPLE SIZE TRAJECTORY: EFFECT OF WHALE SKEW CONCENTRATION

02Statistical Principles

The mathematics of skewed experiments.

A/B testing is a tool for capital protection. Running experiments on whale-skewed ARPU metrics without adjusting your variance metrics guarantees false-positive data.

VARIANCE EXPLOSION

The Whale-Variance Trap

In standard t-tests, sample size scales with variance. Because a tiny fraction of players (whales) can spend $100+ while others spend $0, the variance is orders of magnitude larger than the mean. Standard calculators miss this entirely.

Statistical Friction: Whale Outliers

FALSE POSITIVES

Underpowered Noise Chase

If you run a test underpowered (e.g. stopping at 10,000 users when you need 80,000), a single random whale transaction in variation B will spike the mean ARPU. You will declare a "winner" that was purely random noise.

Statistical Friction: Early Halting

PARETO SHAPE

Pareto Alpha Mechanics

The Pareto Alpha parameter (α) defines how concentrated spend is. An alpha of 1.15 represents extreme whale concentration (common in deep-Gacha RPGs), requiring huge sample volumes. An alpha of 2.2 represents flat spend.

Statistical Friction: Pareto Shape

LIFT VELOCITY

The MDE Leverage Factor

Minimum Detectable Effect (MDE) is the scale of lift you care about. Detecting a minor 2% change requires a massive sample base. Scaling your target MDE to 10% reduces sample sizes by 25x, enabling faster runtimes.

Statistical Friction: MDE Scaling

DECISION-GRADE EXPERIMENTATION & ANALYTICS AUDIT

Establish clean, math-validated testing frameworks.

Product teams regularly burn millions of dollars shipping features that failed A/B significance parameters due to whale dilution noise. We audit analytics architectures, design variance-adjusted sample engines, map cohort testing schedules, and configure clean telemetry metrics.

Book a Testing Framework Audit