- Optimizely X Web Experimentation
- Optimizely X Web Personalization
- Optimizely X Full Stack
THIS ARTICLE WILL HELP YOU:
- Understand Optimizely's Stats Accelerator, its algorithms, and how it affects your results
- Distinguish between the two Stats Accelerator algorithms
- Determine whether to use Stats Accelerator for your experiments, as well as which algorithm to use
- Enable Stats Accelerator (beta) for your account
Stats Accelerator helps you algorithmically capture more value from your experiments, either by reducing the time to statistical significance or by increasing the number of conversions collected. It does this by monitoring ongoing experiments and using machine learning to adjust traffic distribution among variations.
You may hear Stats Accelerator concepts described as the “multi-armed bandit” or “multi-armed bandit algorithms.” See the Glossary of Optimizely Terminology for definitions for important phrases and concepts.
Stats Accelerator algorithms
Stats Accelerator applies one of two machine learning algorithms (or optimization strategies) for the primary metric: Accelerate Learnings or Accelerate Impact. Think of these algorithms as two distinct strategies for optimization, each with its own advantages and use cases: Accelerate Learnings is a good choice when you want to create more variations but still reach statistical significance quickly, while Accelerate Impact works better when you want to weight visitor experiences toward the leading variation during the experiment lifecycle.
The Accelerate Learnings algorithm shortens experiment duration by showing more visitors the variations that have a better chance of reaching statistical significance. Accelerate Learnings attempts to discover as many significant variations as possible.
The advantage of this algorithm is that it will help maximize the number of insights from experiments in a given time frame, so you spend less time waiting for results.
To use the Accelerate Learnings algorithm, you'll need at least three variations, including the original or holdback (baseline) variation.
By contrast, the Accelerate Impact algorithm works to maximize the payoff of the experiment by showing more visitors the leading variations. This will help you extract as much value as possible from the leading variation during the experiment lifecycle, so you avoid the opportunity cost of showing sub-optimal experiences.
To use the Accelerate Impact algorithm, you'll need at least two variations, including the original or holdback (baseline) variation.
Metrics are often correlated, so optimizing one optimizes another (for example, revenue and conversion rate). However, if metrics are independent of each other, optimizing the allocation for the primary metric may come at the expense of the secondary metric.
Select the appropriate algorithm
If you run a lot of experiments, you face two challenges. First, data collection is costly. Time spent experimenting means you have less time to exploit the value of the eventual winner. Both algorithms solve this problem, either by reducing time to significance or maximally exploiting over-performing variations during an experiment.
Second, creating more than one or two variations can delay statistical significance longer than you might like. Accelerate Learnings allows you to be bold and create more variations, while shrinking the time to significance by quickly identifying the variations that have a chance to reach statistical significance.
Most experiments with a clear primary metric tracking unique conversions can benefit from Accelerate Learnings. Read our Stats Accelerator technical FAQ to learn more.
Here are a couple cases that may be a better fit for Accelerate Impact:
Promotions and offers: users who sell consumer goods on their site often focus on driving higher conversion rates. One effective way to do this is to offer special promotions that run for a limited time. Using the Accelerate Impact algorithm (instead of running a standard A/B/n test) will send more traffic to the over-performing variations and less traffic to the underperforming variations.
Long-running campaigns: some Optimizely Personalization users have long-running campaigns to which they continually add variations for each experience. For example, an airline may deliver destination-specific experiences on the homepage based on past searches. Over time, they might add different images and messaging. For long-running Personalization campaigns, the goal is often to drive as many conversions as possible, making it a perfect fit for Accelerate Impact.
Stats Accelerator works with Optimizely X Web Experimentation, Personalization, and Full Stack. For Personalization, the Stats Accelerator makes adjustments to the traffic distribution among variations within an experience.
Because traffic distribution will be updated frequently, Full Stack customers should implement sticky bucketing to avoid exposing the same visitor to multiple variations.
To launch Stats Accelerator and implement the best algorithm for your experiment or personalization experience, navigate to the Traffic Allocation tab and select the algorithm you want to use from the Distribution Mode dropdown list.
Impact on reporting and results
When Stats Accelerator is enabled, the experiment's results will differ from other experiments in three visible ways:
Stats Accelerator adjusts the percentage of visitors who see each variation. This means visitor counts will reflect the distribution decisions of the Stats Accelerator.
Stats Accelerator experiments use a different calculation to measure the difference in conversion rates between variations: weighted improvement. Weighted improvement represents an estimate in the true difference conversion rates that is derived from inspecting the individual time intervals between adjustments. See the last question in the Technical FAQ for details ("
Stats Accelerator experiments and campaigns use absolute improvement instead of relative improvement in results to avoid statistical bias and to further accelerate time to significance.
Relative improvement is computed as:
Absolute improvement is computed as:
Stats Engine will continue to decide when a variation has a statistically significant difference from the control, just as it always has. We would never compromise statistical validity by introducing a new feature. But because some differences are easier to spot than others, each variation will require a different amount of samples allocated to it in order to reach significance.
For the Accelerate Learnings approach, Stats Accelerator decides how many samples each variation should be allocated in real-time to get the same statistically significant results as standard A/B/n testing, but in less time. These algorithms are only compatible with always-valid p-values, such as those used in Stats Engine, that hold with all sample sizes and support continuous peeking/monitoring. This means that you may use the Results page for Stats Accelerator-enabled experiments just like any other experiment.
Stats Accelerator is not a single algorithm, but a suite of algorithms that each adapts its allocation for a different specified goal.
For tasks that balance exploration-versus-exploitation (Accelerate Impact), Optimizely uses a procedure inspired by Thompson Sampling, which is known to be optimal in this regime (Russo, Van Roy 2013
Optimizely also draws from the research area of multi-armed bandits. Specifically, for pure-exploration tasks (Accelerate Learnings), such as discovering all variants that have statistically significant differences from the control, algorithms in use are based on the popular upper confidence bound heuristic known to be optimal for pure-exploration tasks (Jamieson, Malloy, Nowak, Bubeck 2014
This depends on the variations you are exploring. Users typically achieve statistical significance two to three times faster than standard A/B/n testing with Accelerate Learnings. This means with the same amount of traffic, you can reach significance using two to three times as many variants at a time as was possible with standard A/B/n testing.
The model that dictates Stats Accelerator is updated hourly. Even for Optimizely users with extremely high traffic, this is more than sufficient to get the maximum benefits of a dynamic, adaptive allocation. If you require a greater or lower frequency of model updates, please let us know
There is no adverse impact to selecting another baseline, but the numbers may be difficult to interpret. We suggest keeping the original baseline when you interpret Results data.
The Stats Accelerator scheme reacts and adapts to the primary metric. If you change the primary metric mid-experiment, the Stats Accelerator scheme will change its policy to optimize that metric. For this reason, we suggest you do not change the primary metric once you begin the experiment or campaign.
If you pause or stop a variation, Stats Accelerator will ignore those variations’ results data when adjusting traffic distribution among the remaining live variations.
For numeric metrics like revenue, the number of parameters to fully describe the distribution may be unbounded. In practice, we use robust estimators for the first few moments (for example, the mean, variance, and skew) to construct confidence bounds that are used, just like those of binary metrics.
Stats Accelerator will automatically adjust traffic distribution between variations within campaign experiences. This will not affect the holdback. To maximize the benefit of Accelerate Learnings, you should increase your holdback to a level that would normally represent uniform distribution. For example, if you have 3 variations and a holdback, consider a 25% holdback.
In simple terms, if your goal is to learn whether any variations are better or worse than the baseline and take actions that have longer-term impact to your business based on this information, use Accelerate Learnings. If, on the other hand, you just want to maximize conversions among these variations, choose Accelerate Impact.
In traditional A/B/n testing, a control schema is defined in contrast to a number of variants that are to be determined better or worse than the control. Typically, such an experiment is done on a fraction of web traffic to determine the potential benefit or detriment of using a particular variant instead of the control. If the absolute difference between a variant and control is large, only a small number of impressions of this variant are necessary to confidently declare the variant as different (and by how much). On the other hand, when the difference is small, more impressions of the variant are necessary to spot this small difference. The goal of Accelerate Learnings is to spot the big differences quickly and divert more traffic to those variants that require more impressions to attain statistical significance. Although nothing can ever be said with 100% certainty in statistical testing, we guarantee that the false discovery rate (FDR)
is controlled, which bounds the expected proportion of variants falsely claimed as having a statistically significant difference when there is no true difference (users commonly specify to control the FDR at 5%).
In a nutshell, use Accelerate Learnings when you have a control or default and you’re investigating optional variants before committing to one and replacing the control. In Accelerate Impact, the variants and control (if it exists) are on equal footing. Instead of merely trying to reach statistical significance on the hypotheses that each variant is either different or the same as the control, Accelerate Impact attempts to adapt the allocation to the variant that has the best performance.
Time variation is defined as a dependence of the underlying distribution of the metric value on time. More simply, time variation occurs when a metric’s conversion rate changes over time. Stats Engine assumes this distribution is identically distributed.
Time variation is caused by a change in the underlying conditions that affect visitor behavior. Examples include more purchasing visitors on weekends; an aggressive new discount that yields more customer purchases; or a marketing campaign in a new market that brings in a large number of visitors with different interaction behavior than existing visitors.
We assume identically distributed data because this assumption enables us to support continuous monitoring and faster learning (see the Stats Engine
article for details). However, Stats Engine has a built-in mechanism to detect violations of this assumption. When a violation is detected, Stats Engine updates the statistical significance calculations. We call this a “stats reset.”
Time variation affects experiments using Stats Accelerator because the algorithms adjust the percentage of traffic exposed to each variation during the experiment. This can introduce bias in the estimated improvement, known as Simpson's Paradox. The result is that stats resets may be much more likely to occur.
The solution is to change the way the Improvement number is calculated. Specifically, we compare the conversion rates of the baseline and variation(s) within each interval between traffic allocation changes. Then, we compute statistics using weighted averages across these time intervals. For example, the difference of observed conversion rates is scaled by the number of visitors in each interval to generate an estimate of the true difference in conversion rates. This estimate is represented as weighted improvement.
Furthermore, time variation has less of an effect on the Accelerate Impact approach because it does not seek to reduce time to statistical significance declaration. The Accelerate Impact approach seeks to exploit the best-performing variation, weighting recent data more heavily to account for uncertainty. Therefore, the business impact of a stats reset is lower than a stats reset on an experiment that is trying to achieve statistical significance.
To mitigate the effects of time variation even further for Accelerate Impact, we are implementing an exponential decay function that will weigh more recent visitor behavior more strongly to adapt to the effect of time variation more quickly. For both the Accelerate Learnings and Accelerate Impact algorithms, we reserve a portion of the traffic for pure exploration so that we can detect when time variation happens.
An exponential decay function, which we’ve implemented for the Accelerate Impact algorithm, is a good approach to addressing time variation. Exponential decay is a smooth mathematical function to give less weight to earlier observations and more weight to recent observations. It is broadly used to model the effect of early observations, gradually becoming less relevant in the face of changing trends over time.