Sales Forecasting using Prophet Time Series Analysis and Python

Sales forecasting and Goal setting is a common task in most sales driven organizations. A good forecast allows firms to adequately plan and manage their short and long-term goals, with a degree of confidence. It also allows leaders to establish and allocate the right targets to their teams and encourages sales teams to develop a healthy pipeline of opportunities to achieve their goals.

While crucial, it’s also hard to get precise forecasts. In most businesses, sales is a function of many attributes, both macro (e.g. inflation, consumer sentiment/ confidence, employment rates, gdp etc) and micro (e.g firm’s products, capabilities, consumer demand, marketing efforts, retention, incentives etc). It’s impossible to develop a model, complex enough to capture all these variables, however, for managerial decision making, a simple, easy to understand model often works better.

In this very practical example, we are going to use Prophet, an open source library for time series analysis, to derive meaningful insights from past observed data, and use it to predict future values, without worrying too much about intrinsic factors. Goal is also not to get the most precise forecast, but rather a good enough baseline to then establish the firm’s goals.

Broadly, here are the steps. We will discuss the ones in bold in more detail.

  1. Get a sense of underlying business drivers, and how they are expected to change in future.

  2. Collect historical performance data for building your forecasts. e.g. Sales data for your product by time

  3. Perform time series analysis to understand general trend and seasonality

  4. Use a time series forecasting model / technique to predict future values

  5. Evaluate the forecast accuracy / usability based on some predefined performance metrics

  6. Develop a bottoms up forecast independently, based on inputs from your team/business units

  7. Review the gap between our forecast and the bottoms up projections and establish goals

#1. Get a sense of underlying business drivers, and how they are expected to change in future

Using a simple framework like Potter's 5 forces or SWOT, evaluate where your firm is today, what  are it's core differentiated capabilities, who the customers are, what products are in the pipeline, where the industry, competitors, substitutes and suppliers are headed etc. Since forecasts are made using past data, this forward looking groundwork is really critical before making sense of any forecasts.

If you are forecasting just revenue from an individual client, you could still evaluate that account based on drivers related to client's business, industry, competitors etc 

#2. Collect historical performance data for building your forecasts

While more data is always better, usually, a 3-5 year time range is sufficient. This however, may vary depending on the industry and stage of maturity of your firm or product line. If you are an early stage firm, or in a rapidly growing industry, we would use a different forecasting approach such as a business driver model, where you would first forecast the underlying factor like unique user growth and apply a cost per user value to estimate future performance. We will explore this in a later post. For now, let’s assume your firm is in a relatively mature stage (see the chart below), and has some consistent seasonal patterns.

 
sales_2014_2018.PNG
 
 

For the rest of this article, we will use Google Colab, to import the data and required libraries, and perform the analysis using python. The entire code is also available on github. I won’t go into the details of the code but feel free to reach out if any of it is not clear. We will primarily focus on the results and the interpretation.

When you plot the time series data, there are some clear patterns. The trend is definitely positive and growing, and there are some annual seasonal patterns. For instance, revenue seems to spike in Dec/Jan and falls every Feb. There was also a sudden spike in early 2016, when we significantly dropped our prices, that undercut our margins, but captured share in a volatile market


df = pd.read_csv("https://raw.githubusercontent.com/anojrs/Prophet_Example/master/sales_data_eg.csv") #load example data into a dataframe
df.set_index('date').revenue.plot() #plot the data

Daily_daat.png

#3. Perform time series analysis to understand general trend and seasonality

We will now set up Prophet to analyse our time series data and then later make prediction.


!pip install fbprophet #install prophet
from fbprophet import Prophet
df.columns = ['year','ds','y']  #prophet requires dataset columns in a certain format. Needs a date filed called ds and value field called y

Since our time series data is not stationary, one nifty workaround is using a box cox transformation to transform the data and then apply our forecasting technique.


from scipy.stats import boxcox
df['y_orig'] = df['y']
df['y'], lam = boxcox(df['y'])

Now let’s initialize a prophet model, and fit our transformed data into it. Prophet also allows us to include all national holidays or custom define our own list of holidays. Once that’s done we will predict values for the next 365 days.


m = Prophet(interval_width=0.90) #90% confidence internal with yearly seasonality
m.add_country_holidays(country_name = 'US') #include US holidays
m.fit(df)
future = m.make_future_dataframe(periods = 365, freq = 'd') #create 365 extra days for forecasting purpose i.e till end of 2019
forecast_t = m.predict(future); # forecast data using our transformed data


One useful result from Prophet is a view to split the time series into Trend, holidays and weekly and monthly seasonality as below, which confirms our earlier hypothesis. General trend is positive and growing. Holidays in general have a negative impact on our sales. Monday is our best day of week, and Sunday being the weakest. And, July, Nov, Dec and Jan, in general are our best months.


m.plot_components(forecast_t);

plot_components.png

 #4. Use a time series forecasting model / technique to predict future values

Now let’s look at the predictions for 2019. The black dots are historical daily actuals, blue area is the predictions/ best fit curve, and the light blue line is our 90% confidence interval


m.plot(forecast_t);  #Light blue space is the 90% confidence interval
transformed_forecast.png

However, before we can use any of these, remember these are predictions for our box cox transformed data. Let us inverse transform it and look at the results again


from scipy.special import inv_boxcox
forecast = pd.DataFrame()
forecast['ds'] = forecast_t['ds']
forecast[['yhat','yhat_upper','yhat_lower']] = forecast_t[['yhat','yhat_upper','yhat_lower']].apply(lambda x: inv_boxcox(x, lam))

m.history['y_t'] = m.history['y']
m.history['y'] = m.history['y_orig']

df['y_t'] = df['y']
df['y'] = df['y_orig']


The Below data is our required predictions


m.plot(forecast);

original_forecast.png

Now if we sum up the daily means, we get a forecast of $14.1M for 2019, which is a 15% YoY growth from 2018

revenue.PNG
 

#5. Evaluate the forecast accuracy / usability based on some predefined performance metrics

There are various statistical metrics to evaluate accuracy of predictions. We will first look at three common ones. R squared, Mean Squared Error and Mean Absolute error.


metric_df = forecast.set_index('ds')[['yhat']].join(df.set_index('ds').y).reset_index()
metric_df.dropna(inplace=True)
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
r2_score(metric_df.y, metric_df.yhat), mean_squared_error(metric_df.y, metric_df.yhat), mean_absolute_error(metric_df.y, metric_df.yhat)

(0.8732446260947464, 5023915.378259462, 1658.3597924862095)

 The above results look quite positive. A high R squared can also indicate over fitting. MAE of $1685 indicates that on an average, our daily forecast could be off by $1685 on either directions, which is not bad considering, our daily revenue is upwards of ~$37k, i.e. +/- 5% error.

Prophet also provides some useful diagnostic tools including cross validation metrics. You can read more here.


from fbprophet.diagnostics import cross_validation
df_cv = cross_validation(m, initial='730 days', period='180 days', horizon = '365 days')


from fbprophet.plot import plot_cross_validation_metric
fig = plot_cross_validation_metric(df_cv, metric='mape')

crossvalidation_mape.png

#6. Develop a bottoms up forecast independently, based on inputs from your team/business units or through your analysis in step #1

Our time series forecast indicates a growth potential of 15% YoY in 2019. It is always good to develop another forecast to sense check our results. In most organizations, teams would provide a bottoms up forecast based on their knowledge of their accounts, opportunities in the pipeline and other factors. Let’s assume we receive $13.5M (+10% YoY) through our bottoms up process.

#7. Review the gap between our forecast and the bottoms up projections and establish goals

In this last step, we will put all this together to establish the firm’s goals/ targets. This is more Art than science. We will cover this in a separate article in more detail. But the general intuition is that if we now have 2 sets of numbers, a lower bottoms up from teams/ business units of +10% YoY, and a higher Prophet forecast of +15% YoY, managers would then review their findings in step #1 to decide how conservative or aggressive they need to be. In some firms, they may pick a lower number say a +12% YoY, and set that as their base target, and use +15% as the stretch goal, and establish additional rewards and bonuses to incentivize the teams to aim higher. There could also be cases where the managers have access to additional confidential information, like a one-off product issue or recall, that may negatively affect the next year’s performance, in which case, they may even decide to go below the bottoms up. There is clearly no perfect answer. However, a time series forecast at least gives you a possible range of values to use for your decision making.

 
final forecast.PNG
 
Comment