top of page
Writer's pictureGabriel Gonçalves

Anomaly Detection for DevOps

Updated: Jul 31, 2023

Detecting unusual behaviour is an extremely important part of DevOps, as it allows teams to identify and diagnose problems with their systems and applications before they become critical or cause downtime. One popular tool for anomaly detection is Facebook’s Prophet, which is a time series forecasting library that is particularly well-suited for identifying anomalies in heavily seasonal time series.

In this blog post, we will look at how to use Prophet for anomaly detection in DevOps. We will start by discussing the basics of Prophet and how it works, and then we will walk through a simple example of using Prophet to detect anomalies in a time series of performance metrics.

What is Prophet?

Prophet is a time series forecasting library that was developed by Facebook’s Core Data Science team. It is open-source and available on GitHub under the MIT license. Prophet is built on top of the PyStan library, which is a probabilistic programming language for Bayesian analysis.

The key features of Prophet include:

  1. Support for both univariate and multivariate time series

  2. Automatic detection and handling of seasonality and holidays

  3. Built-in support for missing data

  4. Ability to include custom regressors and changepoints

  5. A simple, intuitive API that is easy to use and extend

Prophet is designed to be a “plug and play” solution for time series forecasting, which makes it a great choice for DevOps teams who want to quickly and easily add anomaly detection to their existing monitoring and alerting systems.

How does Prophet work?

Prophet uses a decomposable time series model that is based on three main components:

  1. A trend component, which captures the overall shape of the time series

  2. A seasonal component, which captures periodic patterns in the data

  3. A residual component, which captures any remaining noise in the data

The trend and seasonal components are modeled using smoothing splines, which are flexible, non-linear functions that can adapt to different shapes and patterns in the data. The residual component is modeled using a Gaussian process, which is a probabilistic model that can capture complex, non-linear patterns in the data.

Prophet uses a combination of these three components to generate a forecast for the time series, which can then be used to identify anomalies in the data. An anomaly is defined as a point in the time series that is significantly different from the forecast, and these points can be flagged as potential issues that need to be investigated.

An example of using Prophet for anomaly detection

Now that we have a basic understanding of how Prophet works, let’s look at a simple example of using Prophet for anomaly detection in DevOps. In this example, we will use Prophet to detect anomalies in a time series of CPU utilization metrics for a server.

First, we need to install Prophet and its dependencies. This can be done using the pip package manager, as follows:



pip install prophet 

Next, we need to load the time series data into a Pandas dataframe, and make sure that the data is formatted correctly for Prophet. The time series data should have a ds column that contains the timestamp for each data point, and a y column that contains the value of the metric at that timestamp. In this example, we will use a CSV file called cpu_utilization.csv, that has the following format:

ds                    y
2022-01-01 00:00:00    95.79347943789419
2022-01-01 01:00:00    94.99732043364972
2022-01-01 02:00:00    100.0
2022-01-01 03:00:00    88.2696042150969
2022-01-01 04:00:00    97.78023909290818 

This CSV contains a fictional dataset of the CPU usage of a certain application through 2 months, measured every hour in percentage.

We can then simply load it into a dataframe as shown here:


#load dataset
df = pd.read_csv('cpu_usage.csv')
df["ds"] = df["ds"].apply(pd.Timestamp) 

We can now better visualize the data by plotting it:

plt.figure(figsize=(16, 9))
sns.lineplot(data=df, x='ds', y='y')
plt.show() 

As we can see, this dataset shows a very clear seasonal pattern every week, and there are a few spikes in CPU usage that do not look normal. These are the anomalies we want to detect.

Training the model

We start by dividing the loaded dataset into 30 days for training (30*24 time periods), and the rest for testing our anomaly detection outside our training set:

#split dataset into train and test
train_df = df[:30*24]
test_df = df[30*24:]
 

We then create the model and fit it to the data:

# Create a new Prophet model
model = Prophet(interval_width = 0.999999)

# Fit the model to the time series data using the first 30 days of data
model.fit(train_df)
 

Then, we create a dataframe for the time periods we want to predict using the make_future_dataframe, we can then ask prophet to forecast for these timestamps:

# Make a dataframe of the time periods to be forecast
future_dataframe = model.make_future_dataframe(periods=len(test_df), freq='H')

# Generate a forecast for the train and test set at the same time
forecast = model.predict(future_dataframe)

# Keep forecasts within possible values (0-100%)
forecast['yhat'] = forecast['yhat'].clip(0,100)
forecast['yhat_lower'] = forecast['yhat_lower'].clip(0,100)
forecast['yhat_upper'] = forecast['yhat_upper'].clip(0,100)

# Split forecast into train and test parts
forecast_train = forecast[:30*24]
forecast_test = forecast[30*24:] 

When we forecast any values, prophet automatically creates an interval around the prediction in which values are expected to be with a certain confidence. This confidence can be tuned through the interval_width parameter used above. To perform simple anomaly detection all we have to do is check if this value is within these confidence intervals defined between the yhat_upper and yhat_lower columns of the forecast dataframe:

# Identify points where the difference is outside the confidence interval
anomalies = test_df[(test_df["y"] > forecast_test["yhat_upper"]) | (test_df["y"] < forecast_test["yhat_lower"])] 

The anomalies dataframe will contain the data points that are considered anomalies, and these points can be flagged for further investigation by the DevOps team.

Plotting the results

To better visualize the results of our anomaly detector, we can use the following code:

plt.figure(figsize=(16, 9))

sns.lineplot(data=test_df, x='ds', y='y')
sns.lineplot(data=forecast_test, x='ds', y='yhat')
plt.fill_between(x='ds',y1='yhat_lower',y2='yhat_upper',data = forecast_test, alpha = 0.4)
sns.scatterplot(data=anomalies, x='ds', y='y',color= 'red')

plt.show() 

Running this, we obtain the following plot, where the blue line represents our dataset, the orange line represents our forecast, and the red dots represent the detected anomalies.

As can be seen in the plot, all the anomalies were easily detected, and could be later forwarded to the DevOps team so that they can better optimize the systems in which this application was running, and hopefully avoid the occurrence of these anomalies in the future.

In summary, Prophet is a powerful and easy-to-use tool for anomaly detection in DevOps. By fitting a decomposable time series model to performance metrics data, Prophet can generate forecasts that can be used to identify anomalies in the data and flag them for further investigation. This can help DevOps teams to proactively diagnose and fix issues with their systems and applications, and prevent downtime and other problems.

In this case all that was shown was a simple example, however, if your applications require more complex workflows, the code shown is available here and you can customize it to your liking. If you want to do even more complex stuff, we’re here to help!

Get in contact with us through our linkedin page and we can help you integrate this and other types of time series forecasting into your workflows.

Comments


Sign up to get updates when we release another amazing article

Thanks for subscribing!

bottom of page