Protecting Yourself in the Age of Information: Simpson’s Paradox

February 3, 2019 In Business

Protecting Yourself in the Age of Information: Simpson’s Paradox

A growing problem in today’s digital age is the propensity for false or misleading information to become mixed with the legitimate, thereby muddying the proverbial waters and making it difficult to navigate through the sea of information without getting some of the contaminated muck on yourself. Whether its Trump’s constant tirades about “fake news” or the latest online article about the New Cancer Drug [which] Kills 100% of Cancer Cells! It becomes difficult to separate the fact from the fiction, the advertisement from the news, the informative from the fluff. One of the most common ways people become confused about information is statistics, which can be used manipulatively to mislead readers.

Consider Simpson’s Paradox, named after statistician Edward Simpson, which illustrates how surface-level data can fool you by not revealing what lies beneath. The classic example of Simpson’s Paradox involves a case from 1973 when UC Berkeley was sued for gender discrimination against women based on admissions figures:

Men		Women
Applicants	Admitted	Applicants	Admitted
8442	44%	4321	35%

The data shows that men are significantly more likely to be accepted into UC Berkeley than women. Why is this misleading? Let’s look at the data for admission rates in the six largest departments at UC Berkeley:

Department	Men		Women
Department	Applicants	Admitted	Applicants	Admitted
A	825	62%	108	82%
B	560	63%	25	68%
C	325	37%	593	34%
D	417	33%	375	35%
E	191	28%	393	24%
F	373	6%	341	7%

Notice something funny? For 4/6 of the departments, women are actually more likely than men to be accepted. Then why do the totals show a higher proportion of men being admitted?

Direct your attention to the row for Department A and see the number of applicants for men and women. Even though more women are accepted to Department A at a rate 20% higher than men, the raw number of women accepted is far lower than men. The same pattern is true for department B.

Compare that to Department C, where far many women than are applying than men, but only 34% are being admitted. It turns out women tended to apply to highly competitive departments with low rates of admission, while men tended to gravitate toward less competitive departments with high rates of admission, explaining the surface-level data’s suggestive gender discrimination.

Other examples are in sports statistics, such as in hockey. Consider the following hypothetical goalie save% for two goalies, Swiss Cheese and Mr. Sieve, across two years:

	2017	2018	2017 and 2018
Swiss Cheese	456/487 (0.936)	2003/2301 (0.871)	2460/2788 (0.882)
Mr. Sieve	2116/2312 (0.915)	1544/1789 (0.863)	3660/4101 (0.892)

Despite Swiss Cheese having the better save% in 2017 and 2018, his overall save% for the two years is lower than that of Mr. Sieve. If you look at the number of shots each goalie is facing in each year, you can begin to understand why.

So what can we take away from understanding Simpson’s Paradox? Simply to be careful about trusting every statistic you see at face value, because as with anything, there’s usually something hiding under the hood.

Tags:

#business

Protecting Yourself in the Age of Information: Simpson’s Paradox

Tags:

Follow Me