- Optimizely X Full Stack
THIS ARTICLE WILL HELP YOU:
- Balance latency and freshness in datafile management
- Navigate the tradeoffs for your implementation and performance constraints
- Decide how to implement datafile synchronization
In Optimizely Full Stack, you configure experiments and variations in a web interface hosted on app.optimizely.com. Then, you implement an SDK in your application to bucket users into the right variations based on that configuration. The link between these two systems is called the datafile.
The datafile is a JSON representation of all the experiments, audiences, metrics, etc. that you’ve configured in a project. Here's how to access the datafile in Optimizely.
Whenever you make a change in Optimizely's interface like starting an experiment or changing its traffic allocation, Optimizely automatically builds a new revision of the datafile and serves it on cdn.optimizely.com. To incorporate this change, your application needs to download the new datafile and cache it locally to make decisions quickly. We call this process datafile synchronization.
There are several different approaches you can take to datafile synchronization, depending on your application’s needs. In general, these approaches trade off latency and freshness. Finding the right balance ensures that your datafile stays up to date without slowing down your application. This guide walks through the best practices and alternatives for striking this balance.
There is no one right answer for managing the datafile because every application has different implementation and performance constraints. We recommend evaluating these options and then standardizing on your approach to datafile management by building your own wrapper around our SDKs. This wrapper can also capture other context-specific options like event dispatching and logging.
Datafile management is implemented out of the box in our iOS and Android SDKs. For more information, see the Initialization section of our mobile developer documentation. This guide focuses on managing datafiles in server-side contexts.
Understanding the tradeoffs
To understand the tradeoffs in datafile management, it helps to consider the most naive approach.
Imagine: Every time an experiment runs, you fetch the latest datafile from the CDN, then use it to initialize an Optimizely client and make a decision. This approach guarantees you the latest datafile, but it comes at a major performance cost. Every decision requires a round-trip network request. In asynchronous contexts like SMS or chatbots, this can work. But we don’t recommend it for synchronous use cases like a web server or API.
Instead, we recommend caching a local copy of the datafile within your application, then synchronizing it periodically. This lets you make experiment decisions immediately without waiting for a network request, while still keeping the configuration up to date.
For example, you can set up a timer to re-download the datafile every five minutes and store it in memory, then read from there every time you make a request. Or, you could use a webhook to keep a centralized service in sync and then make internal HTTP requests for the datafile. As these examples illustrate, there are several choices to consider when implementing datafile synchronization:
Where to store the datafile: Locally in memory, on the filesystem, or on a separate service
When to update it: Via a “pull” model that polls for updates on a regular interval, or by listening for a “push” from a webhook
How to fetch it: Directly from the CDN, or a private authenticated endpoint
The sections below walk through the best practices and tradeoffs to each approach.
Consider the options below when deciding where to store the datafile.
Best performance: We recommend storing the datafile directly in memory. You'll be able to look up the datafile with near-zero latency, so your web service stays performant. In a simple application, this can be done by instantiating the Optimizely object directly and passing it around as needed. In more complex applications, you can use tools like Memcache.
Multiple processes that need to share the same datafile configuration: We recommend keeping the datafile in some kind of local storage. For example, you could keep the JSON file directly on your local file system; generally this is slower than a memory lookup but faster than a network request. Alternatively, you can store the datafile in a distributed store like Redis. In general, we recommend systems that allow fast reads and relatively fast writes.
Colocated services: If you have many colocated services that all need to operate off the same datafile, you can consider hosting the Optimizely SDK as its own independent service. This experimentation service can expose SDK methods like activate and track as HTTP endpoints that other services can hit. Then, you can implement datafile synchronization within the service using any of the methods above. This approach adds a small latency hit from the internal network request, but it makes implementing in a microservice architecture substantially easier -- especially if you have many different types of services operating in different languages. The centralized endpoint allows you to implement the logic just once, rather than separately in each service.
With any of these methods, you have the flexibility to choose what format to cache the datafile in. The most common approach is to store the JSON string of the datafile itself. This is sufficient in many cases, but note that JSON parsing can take as much as 100ms depending on the language, load, and datafile size. Even when parsing is much faster, be careful with implementations that require repeated JSON parsing. For example, don’t re-instantiate the SDK within a loop or this cost can quickly add up. Or if you do need to instantiate repeatedly, pass in the already-parsed object or already-instantiated Optimizely client rather than the raw datafile to avoid this cost.
The other key consideration is when to update the datafile. In general, you have two choices: pull or push. Both approaches are valid, and we recommend using both for best results.
Pull: The “pull” approach consists of polling the Optimizely CDN on a regular interval and then updating the stored datafile whenever a new revision is available. Polling is generally easy to implement through a timer or CRON job.
Pulling works best in cases where you don’t need instant updates. For example, if you’re comfortable with pressing the “pause” button on an experiment that’s performing badly and waiting a while for the change to percolate to your users, then polling on a 5 or 10 minute interval is fine.
Push: If you need faster updates, e.g. every time a feature flag is toggled, we recommend “pushing” the changes as soon as you make them. You can configure a webhook in Optimizely to ping your server as soon as a change is made, so you can pull the update down immediately.
This is the preferred approach for server-side contexts with a reliable network connection, but note that it doesn’t usually apply for web and mobile clients. For an example of webhooks in action, see our Python demo app.
Wherever possible, we recommended using both approaches together. Use a webhook as the primary means for keeping your datafile up to date, but keep polling on a regular interval as a fallback in case the webhook fails for any reason.
Regardless of when you choose to update the datafile, you have flexibility in where you download it from. You can fetch the datafile for your Optimizely project from Optimizely’s CDN. For example, if the ID of your project is 12345 you can access the file at https://cdn.optimizely.com/json/12345.json.
Read about accessing the datafile for more on finding this URL.