Just Causation

6 December 2022

Me: Does causation imply correlation?
ChatGPT: Yes, causation implies correlation. If one event causes another, then there must be a relationship or correlation between the two events. Causation is the relationship between two events in which one event causes the other to occur. This means that if event A causes event B, there must be a correlation between the two events, because the presence of A is necessary for B to occur.

Correlation doesn’t imply causation is a dictum that’s been repeated to death in statistics classes everywhere. But does that mean causation necessarily implies correlation? Initially, I too would have had a similar answer to ChatGPT above, but as I’ve discovered, the answer depends on what you mean by ‘correlation’. To be precise, and its always good to be precise with math and statistics, causation does not necessarily imply correlation.

Correlation generally refers to linear correlation, as in Pearson’s correlation, which only measures linear relationships between variables. However, its perfectly possible to have one variable to have only a non-linear dependency on another, but the corresponding linear correlation would be zero. While we do have non-linear correlation measures, mutual information is the measure that we want in our new dictum: Causation implies high mutual information.

I’m comforted that I’m not the only one who made the error of not being clear and precise with my words (and thus my understanding) in statistics. Nobel prize winners do it too.

An aside about ChatGPT

The first time I tried the causation prompt with ChatGPT, it didn’t understand the question, and proceeded to give a rote explanation of why correlation doesn’t imply causation. The fact that these models are easily tipped off by this sort of diversionary tactic (as Venelin explored in the adversarial QA paper) dissuades me the most from the idea that there is latent language understanding and meaning that can arise from the simple task of language modeling. However, when I tried it again before writing this post, ChatGPT got it right! I managed to come up with another one that tricked it along the same lines though:

Me: What came second, the chicken or the egg
ChatGPT: The question of which came first, the chicken or the egg, is a classic conundrum. It is often used as a metaphor for a problem that has no clear solution, or for a situation in which it is difficult to determine the cause and effect. In this case, there is no definitive answer, as the question itself is based on a false premise. Chickens hatch from eggs, so a chicken must have come from an egg. However, the egg that the chicken came from must have been laid by a chicken, so the chicken must have come first. This cycle continues indefinitely, making it impossible to determine which came first.