- Interpret the results of winning, losing, and inconclusive experiments
- Segment by audience for a granular view of customer behaviors
- Evaluate how the the results of this test affect other valuable revenue streams
- Gather sufficient data before deciding to stop an experiment
So, you designed and ran your test and now you have a dashboard full of results. Congratulations! The results of your experiment - whether winning, losing, or inconclusive - are a valuable resource that you'll use to feed the iterative cycle of your optimization program.
At this point, you may be tempted to jump straight to taking action: deploying changes to the site and creating test plans from winning, losing, and inconclusive variations. But pause, and take a moment to think more deeply about your data. What’s your Results page telling you, beyond which variation won?
Learning from your test results is a core practice of experience optimization.
Even when you see clear winners and losers among your variations, it’s worth digging into your data to consider why your visitors behaved in a certain way. Analyze winning variations to gain insight on how they shaped your visitors' behaviors. Dig into losing and inconclusive tests to learn more about what your customers expect -- and how you can help to provide it. Evaluate test results alongside your qualitative research and the rationale from your hypothesis to consider why your visitors' behaviors changed. As your understanding of your customers grows, your team’s optimization efforts will become more impactful.
Before you stop a running test, evaluate whether you’ve gathered enough data for your business needs.
In this article, we discuss a few key tactics for analyzing your results. When you’re ready to take action, check out this article. To learn from and take action on inconclusive results, check out these tactics as well as the ones below.
Click to watch a one-minute video on winning and losing experiments on your Results page. To learn more about how Results work in Optimizely in general, read this article about the Results page or view this series of videos.
Materials to prepare
- Test results
- Analytics data synthesized with Optimizely results
- Screenshots of your variations
- Qualitative data (surveys, customer reports)
People and resources
- Program manager
Actions you'll perform
- Segment results to look for patterns
- Check secondary and monitoring goals
- Consider seasonality or traffic spikes
- Check the difference interval
- Use the root cause analysis ("5 Why's") to evaluate why the test affected visitors' behaviors
What to watch out for
- Misaligned goals or under-developed hypothesis making it difficult to interpret results
- Emotional attachment to particular outcomes
- Lack of appropriate team member involvement (i.e. program manager not involved in the analysis)
- Failing to document takeaways and communicate the result (whether the variation wins, loses, or is inconclusive)
Once you analyze the results of your experiment, document your insights in your testing roadmap, and share what you’ve learned with stakeholders at your company.
Segment your results
Segmenting your results - or drilling down into your audiences to analyze how different groups behave - is a powerful way to learn more about how visitors respond to changes to your site.
Think of the overall results of your experiment as an average across all your visitors. Not all visitors behave like your average visitor. A returning visitor may have different needs and expectations from a new visitor, or a visitor on a mobile device, for example.
Isolate the results for different groups of visitors to see how a specific type of visitor responds to the changes you made. Different types of visitors have different goals on your site. You may find that a change that doesn’t move the needle for most visitors is a huge hit with a certain subset. Or, an experience that lifts conversions across the board is also a very bad experience for a particular group.
In the results below, the “Text CTA” variation is a clear winner with lifts in conversions across all goals.
But if you segment for Mobile Visitors only, you see that the “Text CTA” variation is a clear loser for those visitors. Moreover, the “Pop-Up” variation - which hasn’t reached statistical significance for all visitors and aggregate (see the image above) - is driving statistically significant losses across all goals for Mobile visitors (below).
Segmenting your results helps you surface important information about your visitors that’s lost in the aggregate view.
Dig into default segments such as browser type or device type, as well as custom visitor segments that are important for your business, to consider the following questions:
- Does any segment of visitors behave differently from visitors overall?
- What do you know about those visitors? Why do you think they respond differently?
- Do your most valuable visitors prefer a certain variation?
Imagine that you’re running an experiment to streamline the login process on your site. You decide to test a Facebook login and find significant lift in that variation overall. But when you segment your visitors by browser type, you find that the conversion rate for visitors using Internet Explorer convert is much lower. In fact, the experiment registers a statistically significant loss. Why?
Assuming no part of the experiment is broken, you might start by considering what you already know about these visitors. Say you know that Internet Explorer visitors are likely to be older or to come from a professional services environment, compared to visitors on Safari (sometimes linked to higher-income and tech-savvy users). Consider whether conversions on Internet Explorer are lower due to that audience’s expectations. Are professional visitors less likely to log in with a personal account? Do older visitors hesitate before connecting through Facebook?
Use these insights to make business decisions about your results. Will you roll out the Facebook login as an option instead of a requirement? Will you personalize it for just the high-converting segments?
Combine this analysis from segments with your other data, such as results from previous experiments, direct data, and indirect data.
To return to the example above, consider why Mobile Visitors respond differently to your experiment from other visitors. Is the Text CTA difficult to click in mobile? Does the Pop-Up CTA add frustration on a smaller screen?
In the next iteration, these insights are inputs for the experiment results section of your direct data. Include what you’ve learned in your Results Sharing document to spread awareness to the rest of your organization. Others in your organization might benefit from this information -- while you raise the visibility of your program and increase its impact!
Learn more about segmenting your results in this article.
Check secondary and monitoring goals
Optimizely allows you to set a primary goal to measures how successfully your test changes visitor behaviors on your site. Stats Engine weighs that primary goal differently from your other goals. The primary goal is evaluated on its own so your most important success metric reaches significance as soon as possible. Secondary and monitoring goals are all the goals in the experiment that aren’t the primary goal.
As a best practice, we recommend that you set a few secondary goals when you design your test to track conversions down the funnel. This helps you gather data on how the changes you made affect visitor behaviors beyond the page that you’re testing. Monitoring goals help you keep an eye on any interaction effects, so you know what impact your test has on other conversion opportunities on your site.
Once your test reaches significance, check both types of goals to gain a broad view of how your test affects your visitors’ behaviors.
Identify the secondary goals in your experiment and map them onto the steps a visitor takes in your funnel. How does your experiment influence downstream goals?
Here are a few questions to help you evaluate:
- Where in your funnel do you see improvement or loss? Does a pattern emerge?
- Is the exit rate higher on any step in the funnel, compared to the original?
- How does a significant lift or loss at a certain step correspond to changes you’ve made?
To learn more about interpreting your secondary goals, read this article on the five most common patterns in results.
Monitoring goals help you answer: where am I optimizing this experience, and where (if anywhere) am I worsening it? Identify the monitoring goals in your test that measure conversions in revenue paths outside of your funnel and beyond your experiment. Evaluate how these goals compare between your original and winning variations.
Here are a few questions to help you evaluate:
- How does my test affect this monitoring goal?
- Are there multiple monitoring goals? What story do these goals tell together?
- How valuable is my primary goal, compared to the metrics tracked by this monitoring goal?
For example, if you’re optimizing a signup form on your Home page with a more attention-grabbing CTA, your primary goal might be completion of certain fields or clicks to the Submit button. But you might wonder how optimizing this form affects browsing behavior on the Product Categories page. If you set monitoring goals on the search button and pageviews on the Product Categories page, you can evaluate how your sign-up experiment affects purchasing behaviors. Did visitors sign up and then exit the site? Consider how this tradeoff affects key company metrics and the bottom line.
Evaluate all monitoring goals to look for warnings that you’re cannibalizing another revenue path. Use this information to consider new test ideas based on the tradeoffs between these two revenue paths and plan how implement the results of the current experiment.
Secondary and monitoring goals provide context for the immediate lifts and losses that the variations generate. They help you guide your program towards a global maximum so you don’t end up refining small parts of your site in isolation. Use this broad, data-driven vision to keep your program focused on providing long-term value to your business.
If your test is taking longer to reach significance, take a look at your primary goal. Is it a high-signal goal?
A high-signal goal measures a behavior that’s directly affected by the changes in your variation. A low-signal goal is not directly impacted by your test. For example, if you add a value proposition such as free shipping on your product details page, the Add-to-Cart click might be a high-signal goal. Clicks to navigation links or revenue at the end of the checkout funnel are low-signal goals; they aren’t the strongest indicators that your new offer works.
Stats Engine calculates your primary goal independently from secondary and monitoring goals; the primary goal will reach significance faster than if it were pooled with those goals. To ensure that your test reaches significance as quickly as possible, use your primary goal to measure a high-signal goal.
If you need to change your primary goal in the middle of your experiment, you can. We don’t recommend making this a regular practice. Stats Engine will recalculate your test with all previous data, as if the new goal was always the primary goal. The old primary goal will be pooled with the secondary goals, so it’ll take longer to reach significance than otherwise.
Adding too many low-signal monitoring goals can also slow down your experiment. So, take stock of what you need to know for the results of your test and long-term planning, and set your goals accordingly! To learn more about setting different types of goals, check out this article.
Seasonality and traffic spikes
Before you stop the test, check that you’ve captured all the data you need.
If external events or traffic spikes are influencing your results, or if the difference interval of your statistically significant experiment is too large, consider letting your experiment run longer for a more comprehensive test.
Sometimes, optimization teams focus experiments on high-traffic periods or seasons when they make the most money. Testing during traffic surges can help speed up optimization. But if you’re testing promising experiences that are likely to generate lift - for instance, seasonal messages during the winter holidays - it might be more effective to translate those experiments into personalization campaigns.
By focusing all testing on high-traffic or high-profit periods, you also risk missing part of the conversion cycle; your data will provide an incomplete picture. For example, imagine you run tests on the weekend because most of your visitors make purchases on Saturday and Sunday. If you limit your experiment window to the weekend, you assume that visitors encounter your variation and convert within the same period. But it can take multiple visits for a customer to convert.
To capture data from first interaction to final conversion, run your experiment on weekdays as well as the weekend. Design your experiment to optimize the entire conversion cycle.
In general, test across your full conversion cycle and through peaks and troughs in traffic.
Broad difference interval
Sometimes, when your goal reaches statistical significance, the difference interval may still be relatively large. The difference interval is a range of values where the difference between the original and variation actually lies; it tells you what you can expect your conversion rate to be if you run the test again.
For example, if a variation “wins” with a difference interval of 0.1 to 10%, the lift you can expect if you run that variation is within that broad range of values. Decide whether the level of uncertainty is acceptable for your business before you decide to stop the test.
If your primary goal is revenue-generating, a narrower difference interval can help you project the impact of this change more precisely. In other words, you’d be able to predict whether your improvement is worth $1,000 or $1 million. If you’re making a business case such as asking for more developer resources to push changes live to your site, it can help to be more specific.
Segment your results to see if a certain subset of visitors are moving the needle. Create a new experiment that is targeted specifically to those visitors to see if you can recreate that lift. If this subset of visitors displays consistent behavior over time, your results will show improvement with a smaller difference interval.
If the primary goal is engagement or user acquisition instead of revenue, a large difference interval and more nebulous result may serve your purposes just as well -- a more precise prediction may not make a difference. Since you know that the change led to a better experience in terms of overall conversions, you can feel comfortable pushing the changes live to your site.
Once you’ve analyzed your results and documented what you’ve learned, you’re ready to decide how to take action.