Correlation Versus Causality

Traffic jam

The other day, I overheard a conversation where two individuals were discussing traffic conditions and car accidents.  One individual made the statement, “There are a lot more car accidents when traffic is particularly heavy.  So if you want to avoid a car accident, do not drive during rush hour.”  Though I was eavesdropping, this statement was somewhat troubling to me since it represented a foregone conclusion, or, more specifically, a post hoc ergo propter hoc fallacy.  Post hoc ergo propter hoc is a Latin phrase that literally means “after this, therefore because of this.”  The fallacy occurs when we label one event as the cause of another simply because we observed it occurring earlier in the chronological sequence of events.  The post hoc ergo propter hoc fallacy has a cousin, cum hoc ergo propter hoc (“with this, therefore because of this”), that is subtly different in that it falsely assigns causality based upon events that occur simultaneously.  We are all guilty of this fallacy (and/or its cousin) to some extent since in our past experiences the assignment of causality in accordance with temporal priority or simultaneity has generally proven to be more useful to us in the decision-making process than the limited information that we may have at the time regarding the relationship between two or more events or conditions.

The post hoc and cum hoc, however, ironically provide the foundation for many of our prejudices, superstitions, and poorly-made decisions.  It is the post hoc at work when a corporate CEO determines to stop paying bonuses simply because every time bonuses are paid, productivity and earnings drop.  It is the cum hoc at work when the same CEO discontinues the sale of a particular product in the eastern region for the single reason that sales in the east are much less than sales in the west.  While the above conclusions may be valid and based upon the statistical analysis of data, the criteria upon which those conclusions were based may not be sound if that criteria was simply the correlation between two events or conditions.  After all, couldn’t there be other factors that cause BOTH of the events or conditions in question?  In the bonus/productivity example, couldn’t the end of the fiscal cycle (when most bonuses are usually paid) also represent a seasonal fluctuation in productivity?  In the regional sales example, could there be more sales activity or a more effective marketing channel or campaign in the west than in the east?  Even in the traffic/accident conversation, couldn’t there be some aspect of the road itself that causes both more traffic and more accidents?  Or better yet, what if the inverse is true and accidents actually cause more traffic?

The statistical analysis of data will only get us as far as correlation—and even then, any correlation will only be within some acceptable confidence interval that can never be 100%.  True causality can only be established through simulation and/or experimentation.  For business process improvement, the required simulation and/or experimentation to validate causality takes place during the Improve Phase within the Define-Measure-Analyze-Improve-Control sequence of Six Sigma.  The focus on true root cause analysis and identification is one of the primary reasons why Six Sigma is the business process improvement methodology of choice.  If you would like to find out more about applying Six Sigma to improve business processes contact us at Amelioration Incorporated (


If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.

Leave a Reply