CRO & Analytics

A/B Test Planner

/ab-test

Claude Code builds A/B test plans that are statistically sound and strategically prioritized. It writes the hypothesis, defines the success metric, calculates the sample size needed for significance, and sets guardrail metrics to prevent experiments from causing collateral damage.

Capabilities

What this skill does

Write a structured hypothesis statement linking the change to the expected behavior and outcome

Calculate required sample size based on current conversion rate, expected uplift, and confidence level

Define primary success metric and secondary guardrail metrics to monitor during the test

Prioritize the test backlog using an ICE or PIE scoring framework

Getting started

How to use it

Describe the test idea

Share what you want to change, what behavior you expect it to change, and why you believe this change will improve the metric.

Run /ab-test

Provide your current conversion rate, monthly traffic to the page, and the minimum uplift you want to detect.

Run and monitor the test

Use the plan to configure your testing tool and monitor the guardrail metrics daily while the test runs.

Try these

Example prompts

claude

$Plan an A/B test for our pricing page CTA. Current CVR is 3.2%. We get 4,000 visitors per month. We want to test button text: 'Start Free Trial' vs. 'Get Started Free'.

claude

$Build a test plan for 3 landing page experiments we're considering. Help me prioritize them and calculate sample sizes for each.

claude

$Design an A/B test for our email signup form. Currently converting at 1.8%. We want to test adding social proof below the form.

Use cases

Who it's for

CRO teams running structured testing programs who want statistically valid test plans

Growth marketers who have ideas but need to validate them before investing in implementation

Marketing managers presenting A/B test plans to leadership for approval

Common questions

Frequently asked questions

How long should an A/B test run?

Long enough to reach statistical significance with your traffic levels. Claude Code calculates this based on your current conversion rate and expected uplift. As a rule, never stop a test early because it looks like it's winning. Let it reach the required sample size.

What confidence level should I use?

95% is the standard for business decisions. For low-risk or exploratory tests, 90% is acceptable. For high-stakes changes like pricing restructures, consider 99%. Claude Code defaults to 95% and explains the tradeoffs.

Should I test one thing or multiple things at once?

Test one variable at a time for clean attribution. If you change the headline and the CTA simultaneously and conversion improves, you won't know which change drove it. The exception is multivariate testing, which Claude Code can plan if you have high enough traffic volumes to support it.

Related skills