Multiarmed bandits: When to experiment and when to optimize
Skip Ahead
 Understand what Optimizely’s multiarmed bandit optimization is and how it works
 Decide when to choose a multiarmed bandit optimization instead of an A/B experiment
In Optimizely, you can run tests to do one of two things: either experiment or optimize.

When you experiment, you’re trying to test a hypothesis or validate a claim. The goal is to determine whether a variation is fundamentally different (via statistical significance) with the aim of generalizing learnings from that knowledge into future deployments or experiments.

When you optimize, on the other hand, you’re using a setitandforgetit algorithm designed to squeeze as much lift from a set of variations as possible, without concern for visibility into whether a variation is fundamentally better or worse. What that means is, you’re ignoring statistical significance in favor of maximizing your goal.
Multiarmed bandit (MAB) optimizations aim to maximize performance of your primary metric across all your variations. They do this by dynamically reallocating traffic to whichever variation is currently performing best. This will help you extract as much value as possible from the leading variation during the experiment lifecycle, so you avoid the opportunity cost of showing suboptimal experiences.
In other words, the better a variation does, the more traffic a multiarmed bandit will send its way. A/B tests don't do this. Instead, they keep traffic allocation constant for the experiment's entire lifetime, no matter how each variation performs:
Click here for a thorough explanation of what's happening in this graph.
When to use a multiarmed bandit
Here are a couple cases that may be a better fit for a multiarmed bandit optimization than a traditional A/B experiment:

Promotions and offers: users who sell consumer goods on their site often focus on driving higher conversion rates. One effective way to do this is to offer special promotions that run for a limited time. Using a multiarmed bandit optimization (instead of running a standard A/B experiment) will send more traffic to the overperforming variations and less traffic to the underperforming variations.

Headline testing: headlines are shortlived content which lose relevance after a fixed amount of time. If a headline experiment takes just as long to reach statistical significance as the lifespan of a headline, then any learnings gained from the experiment will be irrelevant going forward. Therefore, a multiarmed bandit optimization is a natural choice to allow you to maximize your impact without worrying about balancing experiment runtime and the natural lifespan of a headline.
Set up a multiarmed bandit optimization
To set up multiarmed bandit optimization on your experiment, select MultiArmed Bandit from the Create New... dropdown when you first create your optimization.
You can use multiarmed bandit optimizations in Full Stack; however, you can't use them for feature rollouts in Feature Management.
Interpreting MAB results
If you're an Optimizely user, you probably have a good understanding of how to interpret the results of a traditional A/B test. Those interpretations won't work for MABs, for two important reasons:

Multiarmed bandits don't generate statistical significance, and

Multiarmed bandits don't use a control or a baseline experience
Because of this, the MAB results page focuses on improvement over equal allocation as its primary summary of your experiment's performance.
MABs do not show statistical significance
With a traditional A/B test, the goal is exploration: collecting data to discover if a variation performs better or worse than the control. This is expressed through the concept of statistical significance.
Statistical significance tells you whether a change had the effect you expected. You can use those lessons to make your variations better each time. Fixed traffic allocation strategies are usually the best ways to reduce the time it takes to reach a statistically significant result.
On the other hand, Optimizely’s multiarmed bandit algorithms are designed for exploitation: MABs will aggressively push traffic to whichever variations are performing best at any given moment, because the MAB doesn’t consider the reason for that superior performance to be all that important.
Since multiarmed bandits essentially ignore statistical significance, Optimizely will do the same. This is why statistical significance does not appear on the results page for MABs: It avoids confusion about the purpose and meaning of multiarmed bandit optimizations.
MABs do not use a baseline
In a traditional A/B test, statistical significance is calculated relative to the performance of one baseline experience. But MABs don’t do this. They’re intended to explicitly evaluate the tradeoffs between all variations at once, which means there is no control or baseline experience to compare to.
What’s more, MABs are "setandforget" optimizations. In an A/B test, you follow up an experiment with a decision: do you deploy a winning variation, or stick with the control? But since MABs continuously make these decisions throughout the experiment’s lifetime, there’s never any need for a baseline reference point for that decision, because you'll never need to make it yourself.
Improvement over equal allocation
Improvement over equal allocation represents the gain in total conversions in the current MAB test over a hypothetical state, in which an A/B test with fixed, equal traffic allocation had been run instead. We can estimate this improvement using the following formula:
In other words, we assume that the observed conversion rate using most recent data is an accurate estimate of the true conversion rate for that arm of the experiment. Then, in the case that our pool of N total visitors is evenly distributed across the k arms of the experiment so that there are N/k total visitors in each experience, we can apply the observed conversion rate to each pool of visitors to arrive at a good estimate of how many total conversions would have been generated by each arm in an equalallocation A/B experiment.
MAB optimization vs. A/B testing: a demonstration
In this headtohead comparison, simulated data is sent to both an A/B test with fixed traffic distribution and a multiarmed bandit experiment. Traffic distribution over time and the cumulative count of conversions for each mode are both observed. The true conversion rates driving the simulated data are:

Original: 50%

Variation 1: 50%

Variation 2: 45%

Variation 3: 55%
The multiarmed bandit algorithm senses that Variation 3 is higherperforming from the start. Even though it doesn’t yet have high statistical significance for this signal, it still begins to push traffic to Variation 3 in order to exploit the perceived advantage and gain more conversions.
For the ordinary A/B experiment, the traffic distribution remains fixed in order to more quickly arrive at a statistically significant result. Because fixed traffic allocations are optimal for reaching statistical significance, MABdriven experiments generally take longer to find winners and losers than A/B tests.
By the end of the simulation, the multiarmed bandit has optimized the experiment to achieve roughly 700 more conversions than if traffic had been held constant.