Member-only story
Understanding The “Why”: 10 Techniques for Causal Inference
With the right tools you can get some pretty deep insights
In analyzing data at Wangari, one question has kept coming up over the past few months, both internally and from our clients: why?
While studying the correlations between the sustainability efforts of companies and their financials, we had found that in male-dominated industries, enlarging the percentage of women in management increases profitability.
The question is why. Do women managers somehow work harder or differently, thus causing higher profits? Or do higher profits motivate a company to hire more women in management? Or perhaps both are influenced by an underlying factor, like a progressive company culture?
Understanding causation, not just correlation, is essential for making informed business decisions. In fields like finance and sustainability, where data is abundant but complex, causal inference provides the tools to dig deeper into this.
It is already valuable for us to be able to go to several different companies (and their bankers) and tell them that, from a statistical point of view, they would be increasing their chances for future success by hiring more women managers. However, being able to tell them about an actual causal relationship would make this argument even more compelling.
We figured that we are not the only ones who had to go beyond a Data Science 101 and dig deeper into causal inference. In this article, we introduce ten powerful and popular techniques that help data scientists move from the “what” to the “why.” We also cover general best practices for causal inference, and when to use which technique.
As we start getting our hands dirty ourselves, we will be sharing our experience with specific techniques (and combinations thereof, and many of our research findings!) in subsequent articles.
Causal Inference Is Challenging
Causal inference is fundamentally harder than correlation analysis because it requires untangling the intricate web of relationships between variables. In real-world datasets, multiple factors often interact, making it difficult to determine what’s driving…