Correlation v Causation—Avoid Jumping to Cause
Correlation is not causation. Too often we make the leap from correlation to causation around time. Event A occurs, then Event B occurs. It’s easy to go from “A happened just before B happened” to “A caused B.” Case in point—in a NYC black-out, many people thought they had caused the power outage by flipping a switch or using an appliance. While the correlation is true enough – these factors did co-occur – turning on a hair dryer is not a valid cause. What’s missing is the mechanism of the cause—the logic that describes how a purported cause actually caused the observed effect.
But close correlation between the two actions is so seductive that we tend to jump to cause. You are close – you have found a pattern that might have some explanatory power. But you need to clarify the mechanism of the cause.
In the power outage case, we can simply look at the logic of the situation: I’ve used this same appliance at the same time of day every day for over two years without causing a blackout – how did today’s incident cause the blackout when the other incidents did not? During root cause analysis, we hear this all the time – “There must be something about the new supplier of Compound K, because the day we switched to that our yields went down.” And that’s as far as it goes. But it needs to go farther.
In a recent case involving a bio-pharma process that dropped in yield by 22%, almost overnight, just such a supplier change was present. “It’s the new soy oil,” they declared. “It must be.” The decrease in production did correspond in time to the change in suppliers, but that’s not enough. What is it about that new soy oil that’s causing the drop in production? The key fact in that case related to when in the process the production dropped. Did it drop during the growth phase, when the bugs are multiplying? Did it happen in the production phase, when those fully-grown bugs are now producing antibodies?
In this case, it was the growth phase that had dropped. After the growth phase, the process produced as much protein as expected, as a percentage of the fewer organisms from the growth phase. We asked, “What about the soy oil aids in the growth of organisms?” The answer was proteins – the higher the level of protein in the soy oil, the greater the production. Armed with this logic, we checked the protein levels of the old soy oil against the new soy oil. It turned out that the new oil had over 20% less protein, which provided the rationale for the change in suppliers being identified as the cause.
Next time you hear someone positioning a correlation as a cause, ask about the mechanism – “I can see that that these two things co-occur, but how does this change cause the specific deviation we are seeing?” If you can answer that, you have a real cause you can test.