Causal inference helps making business decisions better

Causal inference helps making business decisions better

Article Data & AI

We make sense of events in the world as collections of causes and their effect. If we get a headache, we know that taking a painkiller may ease the pain. We leave for work before rush hour because we know we’ll likely avoid the traffic and as a result our commute time will be shorter. Business leaders and policy makers likewise make decisions with a goal or a result in mind. Think of a marketing department launching a new media campaign to increase sale numbers or a bank introducing saving recommendations to encourage customers to transfer more funds into savings and investment accounts.

Causal thinking is a human perspective

As analysts, data scientists and machine learning engineers, we often develop predictive analytics solutions to answer our stakeholders’ business questions. But while their questions are about the impact of new actions, predictive models provide information about associations and correlation between predictors and outcome, but don’t necessarily point to the root cause of an outcome. And so, in addition to the difficulty in deciding whether a fresh marketing strategy will lead to higher sales, we would be hard-pressed to say if sales would increase even if we didn’t launch that marketing campaign.

You might be surprised to hear that model predictions don’t point to the causes of the outcome they are trained to predict. But this is because predictive models point to the predictors that are most likely to be associated with the outcome. This is why, for example, we can say that a lifestyle consisting of a vegetable-rich diet, exercise, no-smoking and moderate drinking of alcohol is associated with longevity, but we don’t know for sure that these factors are the root causes of longevity. But when it comes to making decisions, these associative factors are translated into rules that give us the impression that if we drink less and exercise more, we gain more years on earth. We, causative creatures, seek causal explanations from prediction models that aren’t equipped to provide them.

Casual Inference practices

The best way to decide whether a factor causes an outcome is to conduct randomized controlled trials, which are the standard practice in medicine. In such a trial or experiment, we randomly assign participants to a control and a test group, to ensure that we randomly spread any additional factors that might affect the results, such as age, gender, and certain physiological and lifestyle-related characteristics. Random assignment ensures that control and test groups are balanced for these additional factors, which are called covariates or potential confounds.

Experiments are quite common in e-commerce, where they are better known as A/B testing. The large amount of traffic on a company’s website means that they can test the impact of webpage layout, customized recommendations of products and other features on large groups of people by comparing users exposed to these changes with users who were not. But there are limitations to A/B tests: While the sample size is oftentimes large enough for us to be confident about the effects we detect, we don’t know everything about every user, and so we don’t know if there were additional factors that may have played a role in the outcome. Another limitation that is true for randomized controlled trials in medicine and A/B testing alike, is that there are sometimes experiments we don’t want to run on real people, namely when we want to know the detrimental effect of certain factors. This is true about a harmful drug or procedure or offensive and hurtful content online. We do want to know the adverse impact of these factors, so we know how to prevent the harmful causes, but we don’t want to subject actual people to them!

Casual Inference use cases in Eraneos projects

Causative machine learning can aid us with asking questions about both beneficial and harmful effects without conducting actual experiments, by analyzing observational data. Here’s an example taken from our experience at Eraneos: I was part of a project led by my colleague, Gergely Bognár. Together, we developed a machine learning model to predict the likelihood of burnout among employees. In this model, we’ve used various datasets of employees, their burnout score, and many other factors characterize the individuals, like gender and family-status, and their situation at work, such as seniority and workload. While the model performed well in predicting individuals’ burnout score, we were still in the dark about which factors are root causes of burnout, and which are merely associated with it. To answer this, I’ve compared a few causative machine learning models that simulated each distinct factor as a hypothetical “intervention” – such as workload – and the burnout score as an “outcome”. We’ve found that working overtime had the greatest impact on the burnout score, but we’ve also found strong statistical dependencies between age and other workload factors, such as working overtime, and working more night shifts (for example, at hospitals). We have also found that some datasets are biased for factors such as gender – for example, by not having sufficient data for female workers in senior roles. This project illustrates that causal machine learning can help us understand causal relations in the data and at the same time can point to our analytical limitations and our data’s potential biases.

In addition to applying causal inferences to observational data, causal machine learning has been useful in projects where experiments were limited. We have been supporting Hallmark Cards’ efforts to optimize their card selection and layout in stationery stores. Due to various limitations, we couldn’t carry out large-scale experiments on innovative layouts in these stores; we had to make do with a small number of participating stores for a brief time. Causal machine learning and rigorous post-hoc analysis of the experiments allowed us to evaluate the potential impact of each experiment had it been carried out in more stores. Causal inference complemented here the traditional analysis of variance (e.g., ANOVA) to answer the causal business question with a high degree of confidence.

Incorporating causal inference into your data science and machine learning pipeline doesn’t have to be difficult: It can become an integral part of the process, just like feature and model selection, and can in fact inform these steps. A simple causative model that considers just the top 3 or 5 most crucial factors in a model will already help you get a better idea of which factors can stand in causal relation with the outcome.

Causal inference is useful in evaluating the impact of an existing prediction model or decision-making process. Say, for example, that you would like to evaluate the impact of introducing a new payment method (e.g., pay later) on your webapp, or a new fraud prediction model. We can help you model the effect of these new processes, both their benefits (success rate for payments and for true fraudsters identified) as well as their risk and harm (user who would fail to pay for the product, or the rate of innocent people wrongly identified as fraudsters).

For more questions and advice about including causal machine learning in your projects, feel free to reach out to our team.


Stay up to date!

Are you enjoying this content? Sign up for our (Dutch) Newsletter to get highlighted insights written by our experts.