When analyzing data, we sometimes come across the need to model the phenomena that we actually observe. One example is the modelling of the sale of goods using various supporting features, i.e. advertising campaigns, promotions, but also the inherent components of the product, such as price, distribution. One of the most popular models used to capture the relationship between these features is a class of models called MMM (Marketing Mix Modelling).
Marketing mix modeling is an analytical approach that often uses store- or retailer- level sales data to assess the impact of various marketing activities on sales. Mathematically, it is expressed by the relation of various marketing activities with the sales, in the form of a linear or a non-linear equation, through the statistical technique called a regression.
MMM is able to define the effectiveness of each of the marketing elements in terms of its contribution to sales volume. Thanks to this technique , the results are then adopted to adjust marketing tactics and strategies, optimize the marketing plan and also to forecast sales while simulating various scenarios.
Current input I would like to dedicate no explanation what is MMM in details but I want to focus more on the mathematical interpretation of the assumptions of this type of model.
The task the MMM approach has to solve is to find the relationship expressed by following equation:
We can see a functional dependence of sales and such important aspects as price, promotion and marketing activities.
As potential predictors there are also included control variables like seasonality and special events with significant impact on sales e.g. Black Friday, Cyber Monday, Christmas…
It is important in modelling to take into consideration also additional factors like lockdown resulting from COVID-19 but also macroeconomic metrics that are affected by the geopolitical situation.
Mathematical form of the model is as follows
- β is called the price elasticity (we expect that is negative – higher price, lower sales volume)
- ExternalEvent is simple binary (0 or 1) variable – the event took place or not,
- Distribution is between 0% and 100% - it means how many stores (in %) offer the target brand / product,
- Discount is expressed in percentage promotion and Marketing is the most common related to spending or another KPI strongly connected with the effectiveness of marketing campaigns (e.g. GRPS or impressions).
The formula above is multiplicative, it means that the sales is expressed as a product of the elements related to the predictors. This kind of equation is not perfect to optimize – to estimate the parameters in more efficient way, good idea is to transform the both sides of the equation.
The most commonly used model in MMM is log-log approach, what means that before the estimation of parameters both sides of formula are undergoing logarithmic transformation.
In the most of articles dedicated to MMM topic we can encounter the regression equation just in this form. Unfortunately although it is easier to estimate, it is not a form simple to interpret and explain. It is hard to understand the implications of the variables’ behaviour included in log-log model.
One of the doubts plaguing an analyst considering the use of the log-log model is this one related to the question of whether a variable should be logarithmized or not ... The simple answer to this seemingly difficult question is provided by the multiplicative form of the model. Moreover it seems to be obvious if we are looking at the multiplicative representation of the model.
We can distinguish two types of parameters in multiplicative formula:
- Influential like price, distribution that have huge impact on sales
- Incremental like discount, marketing that are treated as a correction, their impact is significantly lower
Using limit of a function theory we can observe that
The other variables in the model are perceived slightly different. They can be viewed as incremental variables. Lack of promotions, marketing activities or Black Friday events does not result in no sales.
When such events occur, we observe an increase in sales for advertising expenses, promotion, or a decrease in sales, e.g. for lockdown. However, the effect of these variables will never be as drastic as in the case of price, distribution, size of the assortment.
The lowest value of the incremental variable (commonly 0) is related to the ‘base level’ of sales.
To make a conclusion, these incremental variables shouldn’t be transformed by logarithm in contrast to these ones that they are not related to exponential function e.g. prize, distribution.
The relation between sales and independent variables is clear when we look at the formula in multiplicative form. After log-log transformation it is really hard to capture the nuance in their interpretation.
As the main conclusion I would like to say that when we are reflecting on the sense of modelling for which we are responsible, it is worth taking a step back and trying to interpret what we are doing.
So when you have the log-log model, please consider the reverse transformation to get the multiplicative form which is much easier to conclude.