Maximize lift with multiarmed bandit optimizations
Skip Ahead
 Understand what Optimizely’s multiarmed bandit optimization is and how it works
 Decide when to choose a multiarmed bandit optimization instead of an A/B experiment
If you're an Optimizely user, you probably have a good understanding of how to interpret the results of a traditional A/B test. Those interpretations won't work for multiarmed bandits, for two important reasons:

Multiarmed bandits don't generate statistical significance, and

Multiarmed bandits don't use a control or a baseline experience
Instead of statistical significance, the MAB results page focuses on improvement over equal allocation as its primary summary of your optimization's performance. This article breaks down the key differences between multiarmed bandits and traditional A/B tests, culminating in a demonstration of how each approach would unfold in identical situations.
You can use multiarmed bandit optimizations in Full Stack; however, you can't use them for feature rollouts in Feature Management.
Why MABs do not show statistical significance
With a traditional A/B test, the goal is exploration: collecting data to discover if a variation performs better or worse than the control. This is expressed through the concept of statistical significance.
Statistical significance tells you whether a change had the effect you expected. You can use those lessons to make your variations better each time. Fixed traffic allocation strategies are usually the best ways to reduce the time it takes to reach a statistically significant result.
On the other hand, Optimizely’s multiarmed bandit algorithms are designed for exploitation: MABs will aggressively push traffic to whichever variations are performing best at any given moment, because the MAB doesn’t consider the reason for that superior performance to be all that important.
Since multiarmed bandits essentially ignore statistical significance, Optimizely will do the same. This is why statistical significance does not appear on the results page for MABs: It avoids confusion about the purpose and meaning of multiarmed bandit optimizations.
Why MABs do not use a baseline
In a traditional A/B test, statistical significance is calculated relative to the performance of one baseline experience. But MABs don’t do this. They’re intended to explicitly evaluate the tradeoffs between all variations at once, which means there is no control or baseline experience to compare to.
What’s more, MABs are "setandforget" optimizations. In an A/B test, you follow up an experiment with a decision: do you deploy a winning variation, or stick with the control? But since MABs continuously make these decisions throughout the experiment’s lifetime, there’s never any need for a baseline reference point for that decision, because you'll never need to make it yourself.
Improvement over equal allocation
Improvement over equal allocation represents the gain in total conversions in the current MAB test over a hypothetical state, in which an A/B test with fixed, equal traffic allocation had been run instead.
Optimizely estimates the reward of equal allocation by calculating the average reward per visitor for every arm and every time period, then multiplying this number by the number of visitors that would have been assigned had equal allocation been used.
To do this, Optimizely first breaks up the history of your optimization into a series of time spans (or epochs). It then performs the following procedure:
Remember, traffic allocation remains the same for each epoch.

Optimizely computes the average reward per visitor for each arm of your MultiArmed Bandit optimization.

The sum total traffic across all arms in a given epoch is then divided equally among the arms.

For each arm, Optimizely multiplies the traffic total from step 2 by the average reward per visitor. This generates an estimate of the reward for that arm, in that epoch, under the equal allocation method.

These quantities are collected for each arm and each epoch, and then summed to generate an estimate of the total equal allocation reward.

Finally, Optimizely subtracts this estimate from the total reward the algorithm actually provided. This is the improvement over equal allocation.
MAB optimization vs. A/B testing: a demonstration
In this headtohead comparison, simulated data is sent to both an A/B test with fixed traffic distribution and a multiarmed bandit optimization. Traffic distribution over time and the cumulative count of conversions for each mode are both observed. The true conversion rates driving the simulated data are:

Original: 50%

Variation 1: 50%

Variation 2: 45%

Variation 3: 55%
The multiarmed bandit algorithm senses that Variation 3 is higherperforming from the start. Even without any statistical significance information for this signal (remember, the multiarmed bandit does not show statistical significance), it still begins to push traffic to Variation 3 in order to exploit the perceived advantage and gain more conversions.
For the ordinary A/B experiment, the traffic distribution remains fixed in order to more quickly arrive at a statistically significant result. Because fixed traffic allocations are optimal for reaching statistical significance, MABdriven experiments generally take longer to find winners and losers than A/B tests.
By the end of the simulation, the multiarmed bandit has optimized the experiment to achieve roughly 700 more conversions than if traffic had been held constant.
FAQs
For numeric metrics, Optimizely uses a form of Epsilon Greedy, where a small fraction of traffic is uniformly allocated to all variations and a large amount is allocated to the variation with the highest observable mean.
In Personalization, multiarmed bandit can be applied on the experience level; this works best if when you have two variations aside from the holdback.
You should not see a baseline variation when using MAB with a Web or Full Stack experiment.
It is not possible to change your primary metric in Optimizely X Web or Full Stack once your experiment has begun.
On top of that, Optimizely reserves a portion of traffic for pure exploration, so that time variation is easier to detect.