heyoo.ai

A/B Testing

Growth Metrics

A/B testing, also called split testing, is the practice of running a controlled experiment between two versions of a marketing asset to learn which one moves a target metric more. The audience is split randomly, the variants run in parallel, and the winner is the one that produced a statistically meaningful lift.

It is the closest thing marketing has to the scientific method. Done well, it replaces opinions with evidence and stops the highest-paid-person's-instinct from steering decisions. Done poorly, it produces a steady stream of false positives that look like wins on a slide and never repeat in the wild.

Key takeaways

  • Statistical Significance threshold is typically 95%, meaning there is a 5% or lower chance the observed lift is random noise.
  • Most B2B tests need 2 to 4 weeks of traffic to reach significance; running shorter tests is a leading cause of false positive launches.
  • Test one variable at a time when you can. Multivariate tests are valid but require multiples more traffic to read clearly.

What is A/B testing?

A/B testing is a randomized experiment with two variants: a control (the current version) and a treatment (the proposed change). Visitors are randomly assigned to one of the two, and a chosen metric (signups, clicks, revenue per session) is compared across the groups.

The goal is to isolate the effect of the change. Random assignment removes most confounding variables: time of day, traffic source, day of week, and so on cancel out across both groups. What remains is, statistically, the effect of the variant itself.

A/B testing is sometimes confused with before-and-after comparisons. They are not the same. A before-and-after view (last month versus this month) mixes the change you made with everything else that happened in the world that month. A proper split test runs the variants concurrently against comparable audiences.

How do you run a valid A/B test?

A defensible A/B test follows the same six steps:

  1. 1.State the hypothesis. "Changing the hero CTA from Get a demo to See it work will increase signup conversion by at least 10%."
  2. 2.Pick one primary metric. Multiple primary metrics multiply your chance of a false positive.
  3. 3.Calculate the required sample size before launching, using your baseline conversion rate, the minimum lift you care about, and a 95% confidence threshold.
  4. 4.Randomize at the visitor level, not the session level, so a returning visitor sees the same variant.
  5. 5.Run the test for at least one full business cycle (typically 2 weeks for B2B), even if significance appears earlier.
  6. 6.Read the result. Ship the winner only if it cleared significance and the absolute lift is operationally meaningful.

Most teams underbuild step 3 and step 5. Without a sample-size calculation, peeking at the dashboard daily and stopping when the chart looks favourable produces a false positive rate well above the nominal 5%.

Common A/B testing mistakes

Three failure modes show up over and over:

  • Stopping early. The most common error. A test that crosses 95% confidence on day three rarely holds at day fourteen. Pre-commit to a sample size and a duration.
  • Testing micro changes that cannot move the metric. A button colour test on a landing page with 200 visitors per week will never reach significance. Reserve A/B tests for changes large enough to detect at your traffic.
  • Ignoring novelty effects. New variants get a temporary engagement bump simply because they are new. Run long enough to see the bump fade.

The fix in all three cases is patience and a pre-registered plan. Decide what you will measure, how long you will run, and what you will ship before any data appears.

A/B testing for low-traffic B2B sites

Most B2B SaaS sites do not have the traffic to A/B test every micro change. A site with 5,000 monthly visitors and a 3% conversion rate detects a 20% relative lift only after several weeks of testing.

Three adjustments help. First, test high up the funnel where volume is highest: ad creative, landing-page hero, and email subject lines. Second, test bigger swings (full hero rewrite, different value proposition) where the expected effect size is larger. Third, accept lower confidence thresholds (90% instead of 95%) for low-stakes decisions, while keeping 95% for anything that touches pricing, signup flow, or core messaging.

When A/B testing is impossible, painted-door tests, qualitative interviews, and well-instrumented launches with a holdout group are reasonable substitutes.

Activate your team on LinkedIn

Heyoo helps marketing teams turn employees into authentic, on-brand storytellers, with personalised drafts, a shared calendar, and pipeline-grade analytics.

Frequently asked questions

How long should an A/B test run?

Long enough to reach the pre-calculated sample size, and at least one full business cycle (typically 2 weeks) to absorb day-of-week effects. Stopping the moment significance appears is a known cause of false positives.

What's the difference between A/B testing and multivariate testing?

A/B testing compares two variants that differ on one variable. Multivariate testing compares multiple variants across multiple variables at once. Multivariate tests can reveal interaction effects but require several times more traffic to read clearly.

Can I A/B test if my B2B site has low traffic?

Sometimes. Test high-volume entry points, test larger changes that produce larger effects, and lower the confidence threshold for low-risk decisions. For pages that get under a few hundred conversions per month, qualitative research often beats waiting six months for a test to read.

Related terms