Machine Learning and where we stand today
Machine learning systems are being widely adopted and continue to provide immense benefits to the industry at large. These systems learn patterns from large volumes of data and provide predictive capabilities. While this is good, not every question we need answers for is predictive. For example, instead of checking ‘what will be next week’s revenue forecast’, the question could be ‘why is next week’s forecast lower’, and how to address the same. Mere prediction is not good enough to answer many of these questions. Organizations would like to know how to prevent a fall in sales. Will a new marketing campaign or a discount offer help to address the fall in sales? Do they need to rebrand or launch new features? Does historical data provide insights to answer these questions? How does their competitors’ strategy affect their sales?
While we may argue that in linear regression, the coefficients provide an estimate of the influence of different variables on the outcome, there is a flaw with the estimate since all the observations in the sample have the same effect since the coefficient is constant. It’s one size fits all! The other issue with such algorithms is that they are bidirectional in nature. We can regress on X and compute Y and vice versa. The focus is on correlation and not causation.
Towards Causal Learning
Decision-making involves understanding the effects of decisions and actions. Traditional prediction models from correlation-based machine learning (ML) models are insufficient to explain the causation. The underlying data on which ML models are built may not contain all the factual and counterfactual scenarios for a certain use case. ML models are successful in pattern recognition on curated and suitably collected independent and identically distributed (i.i.d.) data. This is one reason for the failure of ML models when exposed to new data or counterfactual questions. We need more domain knowledge and external variables to be added to ML models and they need to be validated for multiple counterfactual scenarios. Casual Learning systems provide the framework for getting this done better than traditional prediction models. Prediction models are bidirectional in nature (e.g.: Given X, we can predict Y or Given Y, we can predict X), whereas causal models have a direction and relationship between X and Y (X causes Y and is unidirectional). Prediction focuses on What, whereas causal focuses on Why. They can complement each other.
Causal Learning is an endeavor to get closer to how we as humans understand the world, how we reason, how we adapt to different scenarios without extensive knowledge acquisition. Causal Learning involves creating models that capture the causal mechanism based on the observed data, a new set of variables and assumptions, and incorporate the changes in data distribution when subject to different interventions. The models abstract the learnings and provide the interaction between the variables and the direction in which they operate. Therefore, Causal models will generalize the learning and are better for domain adaptation, unlike machine learning models that over-fit for the underlying data on which they are trained on. Causal inference helps to infer the effect of any policy/intervention/treatment etc. on the outcome. Some examples of where causal learning can be applied would be in determining the —
- Effect of various treatments for a disease
- Effect of marketing campaigns on sales
- Effect of salary increments on employee retention
- Effect of customer satisfaction on improved service
- Effect of new product features on customer adoption
Following is an illustrative representation of a causal relationship in an e-commerce scenario.
In this image, we are looking at the effect of an intervention X on Y. While in traditional machine learning we directly look at the correlation between the variables and outcomes, in causality we look for the effect of the intervention T on the variable X and how it impacts the outcome Y. Here intervention is a confounding variable, T. Failing to account for confounding variables can lead to wrong conclusions on the relationship between independent and dependent variables. Another interesting factor here is the influence of a new product launch from competition, U. In traditional machine learning, we treat many of these extraneous factors as noise or errors that are then ignored.
With causal inference we would be able to compute the ATE (Average Treatment Effect) or CATE (Conditional Average Treatment Effect) and measure the impact of an intervention on the outcome – For e.g., Increase in Sales revenue due to a discount offer vs Sales revenue with no offer.
Where are we on Causal Learning?
Leading AI expert, Yoshua Bengio, compares current deep learning maturity to System 1 thinking and the future causal learning-based deep learning to System 2 thinking. System 1 and System 2 are behavioral science concepts espoused by Daniel Kahneman in his book, Thinking Fast and Slow. System 1 is intuitive, fast, unconscious, parallel, and habitual, whereas system 2 is going to be slow, logical, sequential, conscious, and algorithmic. Causal learning is in a nascent stage of research and there is immense potential since it helps understand how various variables are manipulating and influencing the outcome. It is envisioned to –
- Capture the causality
- Capture the understanding of the interrelation between different variables and how the overall system works
- Provide explainability on how decisions are made/actions are driven
- Provide reasoning when queried
- Generalize for new domains and out-of-distribution scenarios
Which algorithms are available to explore Casual Learning?
Microsoft research has come up with DoWhy, an open-source library on causal inference. It helps implement causal inference in four major steps:
- Modeling: Create a casual graph of the system under consideration. Causal graphs are probabilistic graphical models that encode assumptions about the data generating process and other variables unobserved in the data.
- Identification: Formulate what to estimate.
- Estimation: Compute the estimate.
- Refutation: Validate the assumption and counterfactual scenarios.
Causal models offer the following benefits:
- The error on out-of-distribution data is less for causal models compared to ML models.
- They provide answers to counterfactual questions.
- They are more compatible with new domains and have better capability to handle concept drift.
- They are more secure as they offer better differential privacy and are robust against privacy attacks like membership inference attacks.
- Explainability is inbuilt.
- They make it easy to model what-if scenarios and simulations.
Bringing in the best of ML and graph neural networks will help make progress in Causal Learning and achieve general intelligence capabilities, which is the ultimate goal of AI. Causal Learning is the next frontier in the advancement of AI, allowing us to build AI that can perceive, think, comprehend, decide, and act like humans.