- Optimizely X Web Experimentation
- Optimizely X Web Personalization
- Optimizely X Full Stack
THIS ARTICLE WILL HELP YOU:
- Understand Optimizely's Stats Accelerator, its algorithms, and how it affects your results
- Distinguish between the two Stats Accelerator algorithms
- Determine whether to use Stats Accelerator for your experiments, as well as which algorithm to use
- Enable Stats Accelerator (beta) for your account
If you run a lot of experiments, you face two challenges. First, data collection is costly, and time spent experimenting means you have less time to exploit the value of the eventual winner. Second, creating more than one or two variations can delay statistical significance longer than you might like.
Stats Accelerator helps you algorithmically capture more value from your experiments, either by reducing the time to statistical significance or by increasing the number of conversions collected. It does this by monitoring ongoing experiments and using machine learning to adjust traffic distribution among variations.
You may hear Stats Accelerator concepts described as the “multi-armed bandit” or “multi-armed bandit algorithms.” See the Glossary of Optimizely Terminology for definitions for important phrases and concepts.
How Stats Accelerator works
Stats Accelerator applies one of two machine learning algorithms (or optimization strategies) for the primary metric: Accelerate Impact or Accelerate Learnings. Think of these algorithms as two distinct strategies for optimization, each with its own advantages and use cases:
Accelerate Impact is a regret minimization strategy. Use it when you want to weight visitor experiences toward the leading variation during the experiment lifecycle
Accelerate Learnings is a time minimization strategy. Use it when you want to create more variations (at least three) but still reach statistical significance quickly
The Accelerate Impact algorithm
The Accelerate Impact algorithm is not intended to produce statistical significance. Instead, it works to maximize the payoff of the experiment by showing more visitors the leading variations. For example, if you are trying to increase revenue, Accelerate Impact will figure out which variation does that the best, and then send more traffic to it. The usual measurements and statistics generated by an A/B test may not be valid for Accelerate Impact.
This may not be what you need, so before switching Accelerate Impact on, be sure you understand the differences between the Stats Accelerator algorithms.
The Accelerate Impact algorithm automatically optimizes your primary metric by dynamically reallocating traffic to whichever variation is performing the best. This will help you extract as much value as possible from the leading variation during the experiment lifecycle, so you avoid the opportunity cost of showing sub-optimal experiences.
Here are a couple cases that may be a better fit for Accelerate Impact:
Promotions and offers: users who sell consumer goods on their site often focus on driving higher conversion rates. One effective way to do this is to offer special promotions that run for a limited time. Using the Accelerate Impact algorithm (instead of running a standard A/B/n test) will send more traffic to the over-performing variations and less traffic to the underperforming variations.
Long-running campaigns: some Optimizely Personalization users have long-running campaigns to which they continually add variations for each experience. For example, an airline may deliver destination-specific experiences on the homepage based on past searches. Over time, they might add different images and messaging. For long-running Personalization campaigns, the goal is often to drive as many conversions as possible, making it a perfect fit for Accelerate Impact.
To use the Accelerate Impact algorithm, you'll need a primary metric and at least two variations, including the original or holdback (baseline) variation.
Metrics are often correlated, so optimizing one optimizes another (for example, revenue and conversion rate). However, if metrics are independent of each other, optimizing the allocation for the primary metric may come at the expense of the secondary metric.
The Accelerate Learnings algorithm
By contrast, the Accelerate Learnings algorithm isn't aimed at any specific business case. It's designed to get an actionable result as quickly as possible, for experiments with a single primary metric tracking unique conversions and at least three variations. Read our Stats Accelerator technical FAQ to learn more.
Accelerate Learnings shortens experiment duration by showing more visitors the variations that have a better chance of reaching statistical significance. Accelerate Learnings attempts to discover as many significant variations as possible.
The advantage of this algorithm is that it will help maximize the number of insights from experiments in a given time frame, so you spend less time waiting for results.
To use the Accelerate Learnings algorithm, you'll need a unique conversion primary metric and at least three variations, including the original or holdback (baseline) variation.
If you're trying to measure more than one metric and have any reason to suspect your secondary metric might not move in the same direction your primary metric does, your experiment is not a good fit for Accelerate Learnings.
To launch Stats Accelerator and implement the best algorithm for your experiment or personalization experience, navigate to the Traffic Allocation tab and select the algorithm you want to use from the Distribution Mode dropdown list.
Stats Accelerator only works in partial factorial mode. Once Stats Accelerator is enabled, you cannot switch directly from partial factorial to full factorial mode. If you want to use full factorial mode, you will have to set your distribution mode to Manual.
Stats Accelerator relies on dynamic traffic allocation to achieve its results. Anytime you allocate traffic dynamically over time, you run the risk of introducing bias into your results. Left uncorrected, this bias can have a significant impact on your reported results. This is known as Simpson's Paradox.
To illustrate this, let's look at the charts below. The first chart shows conversion rates for two variations when traffic allocation is kept static. In this example, conversions for both variations begin to decline after each has been seen by 5,000 visitors. And while we see plenty of fluctuation in conversion rates, the gap between the winning and losing variations never strays far from the true lift.
The steady decline in the observed conversion rates shown above is caused by the sudden, one-time shift in the true conversion rates at the time when the experiment has 10,000 visitors.
In the next chart, we see what happens when traffic is dynamically allocated instead, with 90 percent of all traffic directed to the winning variation after each variation has been seen by 5,000 visitors. Here, the winning variation shows the same decline in conversion rates as it did in the previous example. However, because the losing variation has been seen by far fewer visitors, its conversion rates are slower to change.
This gives the impression that the difference between the two variations is much less than it actually is.
Simpson's Paradox is especially dangerous when the true lift is relatively small. In those cases, it can even cause the sign on your results to flip, essentially reporting winning variations as losers and vice versa:
Stats Accelerator neutralizes this bias through a technique we call weighted improvement.
Weighted improvement is designed to estimate the true lift as accurately as possible by breaking down the duration of an experiment into much shorter segments called epochs. These epochs cover periods of constant allocation: in other words, traffic allocation between variations does not change for the duration of each epoch.
Results are calculated for each epoch, which has the effect of minimizing the bias in each individual epoch. At the end of the experiment, these results are all used to calculate the estimated true lift, filtering out the bias that would have otherwise been present.
Impact on reporting and results
When Stats Accelerator is enabled, the experiment's results will differ from other experiments in four visible ways:
Stats Accelerator adjusts the percentage of visitors who see each variation. This means visitor counts will reflect the distribution decisions of the Stats Accelerator.
Stats Accelerator experiments use a different calculation to measure the difference in conversion rates between variations: weighted improvement. Weighted improvement represents an estimate in the true difference conversion rates that is derived from inspecting the individual time intervals between adjustments. See the last question in the Technical FAQ for details ("How does Stats Accelerator handle conversion rates that change over time and Simpson's Paradox?").
Stats Accelerator experiments and campaigns use absolute improvement instead of relative improvement in results to avoid statistical bias and to reduce time to significance.
Relative improvement is computed as:
Absolute improvement is computed as:
Stats Accelerator reports absolute improvements in percentage points, denoted by the "pp" unit:
Additionally, the winning variation displays its results in terms of approximate relative improvement as well. This can be found just below the absolute improvement (in this example, the relative improvement is -12.15%), and is provided for continuity purposes, so that customers who are accustomed to using relative improvement can develop a sense of how absolute improvement and relative improvement compare to each other.
Because traffic distribution will be updated frequently, Full Stack customers should implement sticky bucketing to avoid exposing the same visitor to multiple variations. To do this, implement the user profile service. See our developer documentation for more detail.
Modify an experiment when Stats Accelerator is enabled
It is possible to modify an experiment if you have Stats Accelerator enabled. However, there are some limitations you should be aware of.
Prior to starting your experiment, you can add or delete variations for Web, Personalization and Full Stack experiments as long as you still have the minimum number of variations required by the algorithm you’ve selected. For Accelerate Impact, this number is two, while for Accelerate Learnings, it’s three.
You can also add or delete sections or section variations for multivariate tests, provided that you still have the minimum number of variations required by the algorithm you’re using.
Once you’ve started your experiment, you can add, stop, or pause variations in Web, Personalization, and Full Stack experiments. However, for a multivariate test, you can only add or delete sections. You cannot add or delete section variations once the experiment has begun.
FDR Control with Adaptive Sequential Experimental Design is a technical white paper on the mathematical foundation of Stats Accelerator.