- Understand what Optimizely’s multi-armed bandit optimization is and how it works
- Decide when to choose a multi-armed bandit optimization instead of an A/B experiment
In Optimizely, there are two types of tests you can run: experiments and optimizations.
Experiments are designed to test a hypothesis or validate a claim. The goal is to determine whether a variation is fundamentally different (via statistical significance) with the aim of generalizing learnings from that knowledge into future deployments or experiments.
Optimizations are set-it-and-forget-it algorithms designed to squeeze as much lift from a set of variations as possible without concern for visibility into whether a variation is fundamentally better or worse. Therefore, these algorithms ignore statistical significance in favor of maximizing your goal.
Multi-armed bandit optimizations aim to maximize performance across your variations with respect to your primary metric by dynamically re-allocating traffic to whichever variation is performing the best. This will help you extract as much value as possible from the leading variation during the experiment lifecycle, so you avoid the opportunity cost of showing sub-optimal experiences.
Because multi-armed bandit optimizations are not intended to produce statistical significance, the usual statistical significance and confidence intervals generated by an A/B experiment will not be shown.
Here are a couple cases that may be a better fit for a multi-armed bandit optimization than a traditional A/B experiment:
Promotions and offers: users who sell consumer goods on their site often focus on driving higher conversion rates. One effective way to do this is to offer special promotions that run for a limited time. Using a multi-armed bandit optimization (instead of running a standard A/B experiment) will send more traffic to the over-performing variations and less traffic to the underperforming variations.
Headline testing: headlines are short-lived content which lose relevance after a fixed amount of time. If a headline experiment takes just as long to reach statistical significance as the lifespan of a headline, then any learnings gained from the experiment will be irrelevant going forward. Therefore, a multi-armed bandit optimization is a natural choice to allow you to maximize your impact without worrying about balancing experiment runtime and the natural lifespan of a headline.
Still confused? We’ve provided a short demo to illustrate the main differences between an ordinary A/B experiment and a multi-armed bandit optimization. In this head-to-head comparison, we send simulated data to an A/B test with fixed traffic distribution and our multi-armed bandit and observe the traffic distribution over time as well as the cumulative count of conversions for each mode. The true conversion rates driving the simulated data are:
Variation 1: 50%
Variation 2: 45%
Variation 3: 55%
From this demo we can see that the multi-armed bandit algorithm immediately senses that Variation 3 is higher-performing from the start. Even though it doesn’t have high statistical significance for this signal early on, it still begins to push traffic to Variation 3 to exploit the perceived advantage and therefore gain more conversions. For the ordinary A/B experiment, the traffic distribution remains fixed in order to arrive at a statistically significance signal earlier.
By the end of the simulation, we see that the multi-armed bandit has optimized the experiment to achieve roughly 700 more conversions than if we had held the traffic constant.
Because equal traffic distribution is optimal for quickly detecting a winner or loser, if statistical significance had been computed on data from the multi-armed bandit we would have seen lower certainty compared to statistical significance computed on the data from the A/B test. However, since we are only interested in optimizing for maximum conversions, then statistical significance is irrelevant to our goals and we do not show it on the results page as a result.