Skip to main content


Optimizely Knowledge Base

Resolving discrepancies between data pipelines

  • Understand why Optimizely may appear to be delivering inconsistent data via the results and raw events export pipelines
  • Resolve any discrepancies you may encounter 

Optimizely offers two different data export pipelines: the raw events export service, which gives you access to all your Optimizely events, and the results export, which provides secure access to your Optimizely results data on the cloud so you can store, analyze, combine, or visualize the data as you see fit.

Both data sources are stored on Amazon S3, but there are significant differences between them. In some cases, you may observe differences between the output of your raw events export query and the results you get in Optimizely. This article will help you understand these differences and provide some practical steps to mitigate them. 

Overview of results and raw events export pipelines

Optimizely’s results page is powered by a real-time data pipeline. This pipeline (henceforth called the results export) processes events as soon as it receives them by running several data enrichments, like adding session-level information and applying Optimizely’s event attribution rules. Afterwards, the events are available for querying via the Results API, and on the results page. This usually happens within a few minutes.

The results export is actually a subset of the raw events export, containing all the conversions (results records) Optimizely attributed to experiments and counted in the results. It's also an exact copy of the data Optimizely uses to compute the results page. You can use it for tasks like these:

  • Analyzing experiment results via SQL, using apps or services you already use.

  • Combining your experiment results with other data sources in your data warehouse to measure experimentation impact on external metrics.

  • Creating data-driven dashboards using visualization tools you are familiar with.

On the other hand, raw events export is a static data pipeline that processes all events received over the past 24 hours and stores them in raw event format in S3. It contains the original raw event data as sent by the clients, without any processing or attribution. It also offers transparency on all event data collected by Optimizely, and it's useful for running any event-level analysis on customer data (for example, querying events by a given visitorId or with a specific event tag value).

For more detailed information on getting started with these data export pipelines, partitioning, importing data export partitions into SQL tables, and querying those tables, see our developer documentation.

This diagram illustrates the differing roles of the raw events export and the results export:

These raw events are organized in folders formatted in this way:


The date in the folder path refers to the date the event was received by the Optimizely server (i.e. the server timestamp in PST), and not to the timestamp of the event itself. For example, if an event has a timestamp date of "2017-12-01" but is not received by the server until the following day, it will be saved in the "2017-12-02" folder partition. This has important implications for raw data querying requirements.

Key differences between pipelines

The table outlines the key differences between the raw events export and results export.

Optimizely considers discrepancies within 5% to be acceptable for most customer scenarios.


Results export

Raw events export

Data availability

Next day (usually ~ 8am PST)

Next day (usually ~ 8am PST).


All conversions Optimizely attributed and counted in the results. Includes server processing information such as easy event attribution and session data.

All events received by Optimizely customer with the exact (unprocessed) data sent by the client.


Session Data

Raw Events


Apache Parquet


Analysis Type



Example Queries

“query revenue per visitor”

“query conversions per session”

“query events sent by IP a.b.c.d”

“query events with orderId=xyz”



Inspect all events sent by a visitor.


Retrieve conversions that have a specific attribute or tag.

Event attribution

Event attribution rules are applied automatically.

Event attribution must be manually executed.

Time zone

Results page uses the browser’s time zone setting by default for the results query. For example, a results query for ‘2017-12-01’ to ‘2017-12-07’ from San Francisco will first convert the date range to PST and then consider all events with a timestamp within the converted date range.

Event timestamps use UTC epoch time.

Delayed events

Included in the results a few seconds after they are received by our servers.

Included in the raw data. Note that these events will end up in a daily partition that is based on the event’s arrival date.

Out of bounds events

If an event has a timestamp outside of the experiment’s valid running range, it is automatically excluded from the results.

Included in the daily raw data partition. Must be manually excluded from the query using proper begin/end timestamp conditions.

Duplicate events

Duplicated events are automatically de-duplicated using the event UUID.

Events must be manually de-duplicated using the event_uuid field.

Holdback events

Holdback traffic does not appear in the results.

Must be manually excluded using the isHoldbackTrafficfilter.


For personalization, sessionization is performed automatically by the results pipeline to calculate total sessions and other session metrics.

Events must be manually grouped into sessions.

IP filtering

The IP filters (in web project settings) are applied in real time.

Events from filtered IPs must be manually excluded.

Results resetting

Removes all data from begin time of experiment till time of reset.

Not applicable. Resetting does not erase raw data.

Traffic reallocation

If visitors are allocated (or re-allocated) to more than one variation, Optimizely's event attribution model does not behave as expected, and results are invalid.

Raw events are attributed to the appropriate variation. Because event attribution model does not behave predictably, any comparison with the results is very difficult. This is not a supported scenario.

Event attribution

Optimizely uses specific counting rules to count visitors, sessions, and conversions in the Results page. These rules make up Optimizely’s event attribution model. These rules must be applied manually when querying raw data and matching that data with results. The table below summarizes the rules used by each Optimizely product.


Unique Users/Sessions

Unique  Conversions

Total    Conversions

Web Experimentation

Count distinct visitorIDs that sent >=1 decision or conversion.

Count distinct visitorIDs that sent >=1 conversion.

Count total conversions sent.

Full Stack

Count distinct visitorIDs that sent >=1 decision.

Count distinct visitorIDs that sent >=1 conversion after the first decision.

Count total conversions after the first decision.

Web Personalization

Count total sessions that had >=1 decision. These are called qualified sessions.

Count qualified sessions with >=1 conversion after the decision.

Count total conversions sent from qualified sessions after the decision.

Time zone

By default, the Results page queries the results pipeline using the time zone of the user’s browser. For example, if a user visits the results page from San Francisco and queries results for 2017-12-01 to 2017-12-07, the results pipeline will return results from 2017-12-01 12:00am Pacific time to 2017-12-07 11:59pm Pacific time. If a collaborator now visits the same results page from New York and queries the same date range, then the results pipeline will return results from  2017-12-01 12:00am Eastern time to ‘2017-12-07 11:59pm Eastern time. So the results will differ for these two users, even when all else is equal.

The raw data does not use time zones. By default, all event timestamps use epoch time in UTC/GMT. You must manually ensure that the beginning and ending times for any raw data query match those used in the Results page. To do this, follow these steps:

  1. Navigate to the Results page and select the desired date range.

  2. Look at the Results page’s URL. Copy the timestamp values for the &beginDate and &endDate URL parameters. Each should be a 13-digit integer.

  3. In the raw data query, use these values for your timestamp filter:


FROM events

WHERE timestamp >= {beginDate} AND timestamp <= {endDate}

If the &beginDate and &endDate do not appear in the results URL, manually select the time range from the date picker. The results page will load and the URL will now contain the parameters.

Delayed events

In some situations, events may arrive after a considerable delay. These include mobile scenarios where visitors might suspend the browser app before an event is fired and then resume it hours or days later, as well as Full Stack scenarios where the developer might intentionally queue events and batch-upload them in the future.

Optimizely’s results pipeline relies on the event’s timestamp—instead of its arrival time—to attribute the event to the correct time range, while raw events export simply stores all events to daily partitions according to the time they were received. For example, an event with a timestamp of 2017-12-01 that is received on 2017-12-07 will be stored in the 2017-12-07 partition.

This difference can create an artificial discrepancy between results and raw data. To reconcile these discrepancies, follow these steps:

  1. In your raw data query, select all daily partitions from the time the experiment started until a few days after the experiment ended, even if you are only interested in a sub-range of the experiment’s running time.

    For example, if the experiment ran from 2017-12-01 to 2017-12-07 and you are interested in a comparison between 2017-12-01 and 2017-12-02, you would select the daily partitions for 2017-12-01 to  2017-12-15 (a week after the experiment’s completion). This ensures that the majority of delayed events will be captured by your raw event query.

  2. Next, select events for a particular time window by following the steps described earlier (in the time zone section).  Here’s how it would look for a sub-range of 2017-12-01 to 2017-12-02:


FROM events -- includes data from ‘12-01’ to ‘12-15’ partition


timestamp >= 1512086400000      --epoch for 12/01 12:00am

AND timestamp <= 1512259199000  --epoch for 12/02 11:59pm

Out of bounds events

Out of bounds events have a timestamp outside the experiment’s valid time range. For example, an event that has a timestamp of 2010-10-01 would be considered out of bounds from an experiment that is running between 2012-12-01 and 2012-12-07. Out of bound events can occur if the clock settings on the browser sending the events were incorrect. The developer might also have used incorrect timestamps during event tracking.

The results pipeline will automatically exclude out of bounds events. However, in the raw data, these events will still be present in the daily partition, organized by the date they were received by Optimizely’s servers. You will have to manually exclude them using appropriate time filters, as described in the previous two sections (time zone and delayed events).

Duplicate events

Duplicate events occur when the same event is sent multiple times by the client. Optimizely's results pipeline relies on a unique event identifier (UUID) that should be present in the event payload to de-duplicate events. The web client and all Full Stack SDKs automatically generate event UUIDs during event tracking. Event API developers are responsible for generating event UUIDs on their own.

While de-duplication in the results pipeline is automatic, it must be performed manually when querying raw data:

SELECT count(distinct eventId)

FROM events


event_name = “eventName”

AND timestamp >= 1512086400000     

AND timestamp <= 1512259199000

Holdback Traffic

The holdback visitor group is traffic that is not bucketed into a variation; instead, it is bucketed into an empty or generic experience that is similar to a control. One important difference is that the holdback traffic does not appear in the results for an A/B experiment, while control traffic does. When querying raw events, you must manually exclude holdback traffic using the available IsHoldback filter:

SELECT count(distinct UUID)

FROM events


event_name = “eventName”


AND timestamp >= 1512086400000     

AND timestamp <= 1512259199000

IP filtering

If the project is using IP filtering (available for X Web only), the Results page will automatically exclude any visitors on the filtered IP list. However, this is not true for raw events export. VisitorIDs and their conversions must be manually excluded using the same IP filter conditions in the SQL query.

Results resetting

When results are reset, the underlying raw event data is not subsequently deleted. You will have to ensure that you are not counting events for a time range when the results were reset. You can achieve this using the time filters described in previous sections.