- Optimizely X Web Experimentation
- Optimizely X Web Personalization
- Optimizely X Web Recommendations
THIS ARTICLE WILL HELP YOU:
- Use difference intervals and improvement intervals to analyze results
- Predict what behavior you should see from your results over time
Statistical significance tells you whether a variation is outperforming or underperforming the baseline, at whatever level of confidence you chose. Difference intervals tell you the range of values where the difference between the original and the variation actually lies, after removing typical fluctuation.
The difference interval is a confidence interval of the conversion rates that you can expect to see when implementing a given variation. Think of it as your "margin of error" on the absolute difference between two conversion rates.
When a variation reaches statistical significance, its difference interval lies entirely above 0% for a winning variation, or entirely below 0% for a losing variation.
A winning variation will have a difference interval that is completely above 0%.
An inconclusive variation will have a difference interval that includes 0%.
A losing variation will have a difference interval that is completely below 0%.
Optimizely sets your difference interval at the same level that you set your statistical significance threshold for the project. For example, if you accept 90% significance to declare a winner, you also accept 90% confidence that the interval is accurate.
Note that the difference interval represents absolute conversion rate, not relative conversion rate. In other words, if your baseline conversion rate is 10% and your variation conversion rate is 11%, then:
The absolute difference in conversion rates is 1%
The relative difference in conversion rates is 10%—this is what Optimizely calls improvement
In the difference interval, you will see a range that contains 1%, not 10%.
Example: "Winning" interval
In the example shown above, you can say that there is a 97% chance that the improvement you saw in the bottom variation is not due to chance. But the improvement Optimizely measured (+15.6%) may not be the exact improvement you see going forward.
In reality, if you implement that variation instead of the original, the difference in conversion rate will probably be between .29% and 4.33% over the baseline conversion rate. Compared to a baseline conversion rate of 14.81%, you're likely to see your variation convert in the range between 15.1% (14.81 + .29) and 19.14% (14.81 + 4.33).
Although the statistical significance is 97%, there's a 90% chance that the actual results will fall in the range of the difference interval. This is because the statistical significance setting for your project is 90%: the probability that your difference interval won't change as your variation's observed statistical significance changes. Rather, you'll generally see it become narrower as Optimizely collects more data.
In this experiment, the observed difference between the original (14.81%) and variation (17.12%) was 2.31%, which is within the difference interval. If we run this experiment again, we'll probably find that the difference between the baseline and the variation conversion rate is in the same range.
Example: "Losing" interval
Let's look at another example, this time with the difference interval entirely below 0.
In this example, you can say that there is a 91% chance that the negative improvement you saw in the bottom variation is not due to chance. But the improvement Optimizely measured (-21.9%) may not be exactly what you see going forward.
In reality, the difference in conversion rate will probably be between -2.41% and -1.03% under the baseline conversion rate if you implement the variation instead of the original. Compared to a baseline conversion rate of 7.86%, you're likely to see your variation convert in the range between 5.45% (7.86 - 2.41) and 6.83% (7.86 - 1.03).
In this experiment, the observed difference between the original (7.86%) and variation (6.14%) was -1.72%, which is within the difference interval. If we run this experiment again, the difference between the the baseline and variation conversion rate will probably be in the same range.
Example: Inconclusive interval
If you need to stop a test early or have a low sample size, the difference interval will give you a rough idea of whether implementing that variation will have a positive or negative impact.
For this reason, when you see low statistical significance on certain goals, the difference interval can serve as another data point to help you make decisions. When you have an inconclusive goal, the interval will look like this:
Here, we can say that the difference in conversion rates for this variation will be between -0.58% and 3.78%. In other words, it could be either positive or negative.
When implementing this variation, you can say, "We implemented a test result that we are 90% confident is better than .58% worse, but not more than 3.78% better," which allows you to make a business decision about whether implementing that variation would be worthwhile.
Another way you can interpret the difference interval is as worst case / middle ground / best case scenarios. For example, we are 90% confident that the worst case absolute difference between variation and baseline conversion rates is -0.58%, the best case is 3.78%, and a middle ground is 1.6%.
How statistical significance and difference intervals are connected
As discussed above, there is a 90% chance that the underlying difference between baseline and variation conversion rates—the difference that remains when you remove typical fluctuation—will fall in the range of the difference interval. This is because the statistical significance setting for your project is set to 90%. If you want to be more confident that the underlying conversion rate difference falls into the range of the difference interval, you can widen the difference interval by raising the statistical significance setting.
Higher values for the statistical significance setting correspond to wider difference intervals and a greater chance that the interval contains the underlying difference. Likewise, lower statistical significance settings correspond to narrower difference intervals and less chance that the interval contains the underlying difference.
In addition, there’s an even deeper connection between statistical significance and difference intervals. Because there is a 90% chance the underlying difference (after removing random fluctuation) lies within the difference interval, there is a 10% chance it does not. So when you have a difference interval that is completely to the left or right of zero, you know that there is at most 10% chance the underlying conversion rate difference is zero. You can be at least 90% confident that the underlying difference is not zero, or equivalently, that what you observed is not due to random fluctuation. But this is exactly how we described statistical significance!
In conclusion, calling "winning" and "losing" variations and the width of the confidence interval is controlled by the statistical significance setting. A winner (or loser) is called at the same time as the difference interval is completely to the right (or left) of zero.
Improvement intervals in Optimizely X
To reduce confusion, the Results page in Optimizely X shows the relative difference between variation and baseline measurement, not the absolute difference. This is true for all metrics, regardless of whether they are binary conversions or numeric.
In Optimizely X, an improvement interval of 1% to 10% means that the variation sees between 1% and 10% improvement over baseline. For example, if the baseline conversion rate is 25%, you can expect the variation conversion rate to fall between 25.25% and 27.5%.
Note that significance and confidence intervals are still connected in the same way: your experiment reaches significance at exactly the same time that your confidence interval on improvement moves away from zero.
Estimated wait time and <1% significance
As your experiment or campaign runs, Optimizely estimates how long it will take for a test to reach conclusiveness.
This estimate is calculated based on the current, observed baseline and variation conversion rates. If those rates change, the estimate will adjust automatically.
You may see a large lift in the Improvement column accompanied by a significance of less than %1. You'll also see a certain number of "visitors remaining." What does this mean?
In statistical terms, this experiment is currently underpowered: a relatively small number of visitors have entered the experiment, and Optimizely needs to gather more evidence to determine whether the change you see is a true difference in visitor behaviors, or chance. If you look in the Unique Conversions column, you'll probably see relatively low numbers.
Look at the "Search Only" variation in the example shown above. Optimizely needs approximately 6,500 more visitors to be exposed to that variation before it can decide on the difference in conversion rates between the "Search Only" and "Original" variations. Remember that the estimate of 6,500 visitors assumes that the observed conversion rate doesn't fluctuate. If more visitors see the variation but conversions decrease, your experiment will probably take more time, which means the "visitors remaining" estimate will increase. If conversions increase, Optimizely will need fewer visitors to be certain that the change in behavior is real.
To learn more about the importance of sample size, see our article on how long to run a test.
Unlike many testing tools, Optimizely's stats engine uses a statistical approach that removes the need to decide on a sample size and minimum detectable effect (MDE) before starting a test. You don't have to commit to large sample sizes ahead of time, and you can check on results whenever you want!
However, many optimization programs estimate how long tests take to run so they can build robust roadmaps. Use our sample size calculator to estimate how many visitors you'll need for a given test. Learn more about choosing a minimum detectable effect for our calculator in this article.
Learn more about statistical significance in Optimizely