A/B Test Planner
/ab-test
Claude Code builds A/B test plans that are statistically sound and strategically prioritized. It writes the hypothesis, defines the success metric, calculates the sample size needed for significance, and sets guardrail metrics to prevent experiments from causing collateral damage.
What this skill does
Write a structured hypothesis statement linking the change to the expected behavior and outcome
Calculate required sample size based on current conversion rate, expected uplift, and confidence level
Define primary success metric and secondary guardrail metrics to monitor during the test
Prioritize the test backlog using an ICE or PIE scoring framework
How to use it
Describe the test idea
Share what you want to change, what behavior you expect it to change, and why you believe this change will improve the metric.
Run /ab-test
Provide your current conversion rate, monthly traffic to the page, and the minimum uplift you want to detect.
Run and monitor the test
Use the plan to configure your testing tool and monitor the guardrail metrics daily while the test runs.
Example prompts
$Plan an A/B test for our pricing page CTA. Current CVR is 3.2%. We get 4,000 visitors per month. We want to test button text: 'Start Free Trial' vs. 'Get Started Free'.
$Build a test plan for 3 landing page experiments we're considering. Help me prioritize them and calculate sample sizes for each.
$Design an A/B test for our email signup form. Currently converting at 1.8%. We want to test adding social proof below the form.
Who it's for
CRO teams running structured testing programs who want statistically valid test plans
Growth marketers who have ideas but need to validate them before investing in implementation
Marketing managers presenting A/B test plans to leadership for approval
Frequently asked questions
How long should an A/B test run?
Long enough to reach statistical significance with your traffic levels. Claude Code calculates this based on your current conversion rate and expected uplift. As a rule, never stop a test early because it looks like it's winning. Let it reach the required sample size.
What confidence level should I use?
95% is the standard for business decisions. For low-risk or exploratory tests, 90% is acceptable. For high-stakes changes like pricing restructures, consider 99%. Claude Code defaults to 95% and explains the tradeoffs.
Should I test one thing or multiple things at once?
Test one variable at a time for clean attribution. If you change the headline and the CTA simultaneously and conversion improves, you won't know which change drove it. The exception is multivariate testing, which Claude Code can plan if you have high enough traffic volumes to support it.
Skills that pair well
Get the full toolkit
All 50 skills included free with every Command Center kit