Data and concept drift are frequently mentioned in the context of machine learning model monitoring, but what exactly are they and how are they detected? Furthermore, given the common misconceptions surrounding them, are data and concept drift things to be avoided at all costs or natural and acceptable consequences of training models in production? Read on to find out. In this article we will provide a granular breakdown of model drift, along with methods for detecting them and best practices for dealing with them when you do.
Perhaps the more common of the two model drifts is data drift, which simply refers to any change in the data distribution after training the model. In other words, data drift commonly occurs when the inputs a model is presented with in production fail to correspond with the distribution it was provided during training. This typically presents itself as a change in the feature distribution, i.e. certain values for a given feature may become more common in production whereas other values may see a decrease in prevalence. As an example, consider an ecommerce company serving a LTV prediction model for the goal of optimizing marketing efforts. A reasonable feature for such a model would be a customer’s age. Now, suppose this same company made a change to their marketing strategy, perhaps by initiating a new campaign targeted at a specific age group. In this scenario, the distribution of ages being fed to the model would likely change, causing a distribution shift in the age feature and perhaps a degradation in the model’s predictive capacity. This would be considered data drift.
Contrary to popular opinion, not all data drift is bad or implies that your model is in need of retraining. For example, your model in production may encounter more customers in the 50 - 60 age bracket than it saw during training. However, this does not necessarily mean that the model saw an insufficient number of 50 - 60 year olds during training, but rather that the distribution of ages known to the model simply shifted. In this case, retraining the model would likely be unnecessary.
However, other cases would demand model retraining. For example, your training dataset may have been small enough that your model didn’t encounter any outliers, such as customers over the age of 100, during training. When deployed in production though, the model might very well see such customers. In this case, the data drift is problematic and addressing it is essential. Therefore, having a way to assess and detect the different types of data drift that a ML model may encounter is critical to getting the best performance.
Concept drift refers to a change in the relationship between a model’s data inputs and target variables. This can happen when changes in market dynamics, customer behavior, or demographics result in new relationships between inputs and targets that degrade your model’s predictions. The key in differentiating concept drift from data drift is the consideration of the targets—data drift applies only when your model encounters new, unseen, or shifting data, whereas concept drift occurs when the fundamental relationships between inputs and outputs change, including on data that the model has already seen. Going back to our example of the LTV prediction model, suppose a country-wide economic shift happens in which customers of a certain age group suddenly have more money to spend, resulting in more purchases of your business’ product within this demographic. In fact, this happened quite dramatically during the Covid-19 pandemic, when US government issued stimulus checks fell into the hands of millions of underemployed millennials throughout the country. In this case, the number of millennials interacting with your model wouldn’t necessarily change, but the amount they would spend on purchases would. Detecting this concept drift and retraining the model would be vital to maintaining its performance.
In some sense, you should always care about concept drift, at least to the extent of being aware that it has happened. Because concept drift refers to an underlying shift in the relationships between targets and outputs, model retraining is always required in order to capture these new correspondences. That said, you would only want to retrain the model if the relationships you’re aiming to capture are still representative of your downstream business KPIs. While this will often be the case, it is not always a guarantee. For example, your business model might shift such that you decide you care more about the amount of time customers spend on your website (so that you can increase ad revenue) rather than the amount of money they spend on your actual products (which may have been small to begin with). In such a circumstance, you’d probably want to train an entirely different model, so concept drift in the original model would no longer be a concern.
As our previous examples have illustrated, simply being alerted to the presence of data or concept drift is not sufficient. A deeper understanding of how shifts in the data distribution or relationships between inputs and targets are affecting model performance and downstream business KPIs is critical to addressing drift in the proper context. Many tools fail because they only alert data scientists to changes in the overall data distribution, when in fact, changes to smaller, specific data segments often foreshadow more drastic distributional shifts. The key to successfully addressing drift is being alerted to these subtler, earlier shifts and attending to them promptly, because by the time a drift significant enough to detect in the overall distribution has occurred, the problem has usually already manifested itself in multiple areas and significantly degraded model performance on substantial amounts of data. At this point, remedying the issue becomes a game of playing catch up in which you are always one step behind, allowing data to flow through your system on which your model is improperly trained.
The proper way of addressing data and concept drift is to create a feedback loop within your business process and monitor your model in the context of the business function it serves. You want to decide on actual, quantifiable performance metrics which rapidly allow you to assess how your model is performing at any instant and thereby enable you to understand whether changes in the data distribution are correlating with a decrease in performance. Ultimately, this will allow you to connect input features to actual business outcomes and learn when the underlying concept has shifted. If it has, you can then understand it in context and decide whether it’s worth taking steps to address it.
Finally, you want to ensure that you’re measuring changes to your data on a granular level. Within machine learning, forsaking the trees for the forest can actually manifest errors in problematic ways. Having a good understanding of your model’s performance requires being attuned to specific segments of your data, as these are often the first to show issues before they propagate to the entire distribution as a whole. Continuing with our LTV model example, if customers in a smaller state, such as Rhode Island, were the first to receive their stimulus checks, this might not be a significant enough shift to register across the entire distribution overall. However, knowing about this change could alert you to the fact that more global shifts in the data distribution were forthcoming (i.e. other states would soon be issuing stimulus checks). Thus, detecting changes in data at the granular level is extremely important for early identification of data and concept drift as well as squeezing the best performance from your ML models.
Data and concept drift occur when a model is no longer performing as intended due to changes in data; however, they each manifest for different reasons. Data drift arises when there is a shift in the input data distribution between training and serving a model in production. In these cases, the shift may be inconsequential or it may require model retraining, depending on how well the model generalizes to the new distribution. Concept drift, on the other hand, occurs when the underlying function mapping inputs to targets changes. In these cases, model retraining is nearly always required to capture the new relationships, assuming that these relationships are relevant to your downstream business KPIs. Ultimately, in order to detect data and concept drift, you want to establish a feedback loop between business outcomes and data features. You should also define robust performance metrics based on these outcomes which assess how well your model is doing and correlate this with specific features. And finally, you want to guarantee that you are monitoring changes to your data at a granular level, so that you are alerted to shifts in distribution prior to them propagating and affecting the entire dataset. Contact us to see how Mona can help you monitor data drift and concept drifts or request a demo to see for yourself!