Machine learning systems are being widely adopted and continue to provide immense benefits to the industry at large. These systems learn patterns from large volumes of data and provide predictive capabilities. While this is good, not every question we need answers for is predictive. For example, instead of checking ‘what will be next week’s revenue forecast’, the question could be ‘why is next week’s forecast lower’, and how to address the same. Mere prediction is not good enough to answer many of these questions. Organizations would like to know how to prevent a fall in sales. Will a new marketing campaign or a discount offer help to address the fall in sales? Do they need to rebrand or launch new features? Does historical data provide insights to answer these questions? How does their competitors’ strategy affect their sales?
While we may argue that in linear regression, the coefficients provide an estimate of the influence of different variables on the outcome, there is a flaw with the estimate since all the observations in the sample have the same effect since the coefficient is constant. It’s one size fits all! The other issue with such algorithms is that they are bidirectional in nature. We can regress on X and compute Y and vice versa. The focus is on correlation and not causation.
Decision-making involves understanding the effects of decisions and actions. Traditional prediction models from correlation-based machine learning (ML) models are insufficient to explain the causation. The underlying data on which ML models are built may not contain all the factual and counterfactual scenarios for a certain use case. ML models are successful in pattern recognition on curated and suitably collected independent and identically distributed (i.i.d.) data. This is one reason for the failure of ML models when exposed to new data or counterfactual questions.
We need more domain knowledge and external variables to be added to ML models and they need to be validated for multiple counterfactual scenarios. Casual Learning systems provide the framework for getting this done better than traditional prediction models. Prediction models are bidirectional in nature (e.g.: Given X, we can predict Y, or Given Y, we can predict X), whereas causal models have a direction and relationship between X and Y (X causes Y and is unidirectional). Prediction focuses on What, whereas causal focuses on Why. They can complement each other.
Causal Learning is an endeavor to get closer to how we as humans understand the world, how we reason, and how we adapt to different scenarios without extensive knowledge acquisition. Causal Learning involves creating models that capture the causal mechanism based on the observed data, a new set of variables, and assumptions and incorporate the changes in data distribution when subject to different interventions. The models abstract the learnings and provide the interaction between the variables and the direction in which they operate. Therefore, Causal models will generalize the learning and are better for domain adaptation, unlike machine learning models that overfit the underlying data on which they are trained. Causal inference helps to infer the effect of any policy/intervention/treatment etc. on the outcome. Some examples of where causal learning can be applied would be in determining the —
Following is an illustrative representation of a causal relationship in an e-commerce scenario.
In this image, we are looking at the effect of intervention X on Y. While in traditional machine learning, we directly look at the correlation between the variables and outcomes, in causality we look for the effect of intervention T on the variable X and how it impacts the outcome Y. Here intervention is a confounding variable, T. Failing to account for confounding variables can lead to wrong conclusions on the relationship between independent and dependent variables. Another interesting factor here is the influence of a new product launch from the competition, U. In traditional machine learning, we treat many of these extraneous factors as noise or errors that are then ignored.
With causal inference we would be able to compute the ATE (Average Treatment Effect) or CATE (Conditional Average Treatment Effect) and measure the impact of an intervention on the outcome – For e.g., an Increase in Sales revenue due to a discount offer vs Sales revenue with no offer.
Leading AI expert, Yoshua Bengio, compares current deep learning maturity to System 1 thinking and the future causal learning-based deep learning to System 2 thinking. System 1 and System 2 are behavioral science concepts espoused by Daniel Kahneman in his book, Thinking Fast and Slow. System 1 is intuitive, fast, unconscious, parallel, and habitual, whereas system 2 is going to be slow, logical, sequential, conscious, and algorithmic. Causal learning is in a nascent stage of research and there is immense potential since it helps understand how various variables are manipulating and influencing the outcome. It is envisioned to –
Microsoft research has come up with DoWhy, an open-source library on causal inference. It helps implement causal inference in four major steps:
Causal models offer the following benefits:
The future
Bringing in the best of ML and graph neural networks will help make progress in Causal Learning and achieve general intelligence capabilities, which is the ultimate goal of AI. Causal Learning is the next frontier in the advancement of AI, allowing us to build AI that can perceive, think, comprehend, decide, and act like humans.
Sign up to get the latest perspectives on analytics, insights, and AI.