By Dr. Jamis Perrett
Regulatory Science Statistics Center
As both a statistician and adjunct statistics professor I occasionally come across “studies” that have misused data and reached inaccurate conclusions by graphing what appears to be a cause and effect relationship between two variables.
Unfortunately, it is not difficult to show a trend (correlation) by graphing data and then lead the reader to the impression that a cause-and-effect relationship exists – or that one of the variables in some way influenced the other variable (causation). While causation is certainly true in some situations, there are many instances where the apparent causal link is not true.
Next week, my students will be learning about correlation and causation. In class I typically provide several examples where two variables are found to be correlated, but no causal relationship exists. Here are a few:
- Shoe size vs. reading ability
- Ice cream sales vs. incidence of drownings
- Number of fire trucks present vs. the amount of damage produced by a three-alarm fire
- Organic food consumption vs. increase in gluten intolerance
Consider a situation where correlation likely does in fact include causation. One example is:
- Amount of alcohol consumed vs. response time.
The causal relationship implies that the more alcohol a person consumes, the longer it takes that person to respond to stimuli; this is perhaps the main reason for outlawing driving while under the influence of alcohol. However, when we consider the correlation between organic food consumption and an increase in gluten intolerance, we recognize there is not a causal relationship. Gluten intolerance has been rising since 2000. So has the consumption of organic food. That does not mean that organic food is causing increased gluten intolerance.
Consider the correlation between shoe size and reading ability. Again, we recognize that there is not a causal relationship. Simply purchasing a larger shoe does not make a person smarter.
Why then is there a significant correlation between shoe size and reading ability? Older children tend to have larger feet than younger children. Those larger feet require larger shoes. Older children also tend to read at a higher level than do younger children. So, age, shoe size, and reading ability are all increasing at the same time—perhaps even at the same rate. But there is no causal relationship. Wearing bigger shoes does not cause a child to read better. If it did, I would purchase my six-year-old son some adult-size shoes with the expectation that doing so would elevate his reading ability.
Although all three conditions (age, shoe size, and reading ability) may be correlated with each other, we don’t ascribe a causal relationship among any of them: shoe size does not increase reading level, age alone does not increase reading level, age alone does not increase shoe size, shoe size does not increase age, reading level does not increase shoe size, and reading level does not increase a person’s age.
There are examples of scientific research implying causation based on correlation. In some cases a causal relationship in fact exists. In other cases, it does not.
So, in general, how do we identify causality? Ideally researchers identify causality through an appropriately-designed experiment to test the hypothesis. In the absence of an appropriately designed experiment, causality cannot be determined. Correlation is a linear association between two variables. But there are always other variables that could be the real cause of the results. Until appropriately designed experiments are conducted, correlations simply mean that two things are trending similarly, but may not be related at all.
Dr. Jamis Perrett is the Product Analysis Lead in the Regulatory Statistics Technology Center at Monsanto, an Adjunct Associate Professor at University of Northern Colorado in Greeley, and the 2014 president of the Global Statistics and Modeling Community at Monsanto – a global group with more than 600 employee-members. Prior to joining Monsanto in 2012, he was an Assistant Professor at Texas A&M University in College Station (2008-2012) and at University of Northern Colorado in Greeley (2004-2008). He received his Ph.D. in statistics from Kansas State University in 2004.