Simpson’s paradox | Definition, Example, and Explanation (2024)

statistics

verifiedCite

While every effort has been made to follow citation style rules, there may be some discrepancies.Please refer to the appropriate style manual or other sources if you have any questions.

Select Citation Style

Feedback

Thank you for your feedback

Our editors will review what you’ve submitted and determine whether to revise the article.

External Websites

printPrint

Please select which sections you would like to print:

verifiedCite

While every effort has been made to follow citation style rules, there may be some discrepancies.Please refer to the appropriate style manual or other sources if you have any questions.

Select Citation Style

Feedback

Thank you for your feedback

Our editors will review what you’ve submitted and determine whether to revise the article.

External Websites

Also known as: Yule-Simpson effect

Written by

Bruce W. Carlson Associate Professor, Department of Psychology, Ohio University, Athens, Ohio. His contributions to SAGE Publications's Encyclopedia of Research Design(2010) formed the basis of his contributions...

Bruce W. Carlson

Fact-checked by

The Editors of Encyclopaedia Britannica Encyclopaedia Britannica's editors oversee subject areas in which they have extensive knowledge, whether from years of experience gained by working on that content or via study for an advanced degree. They write new content and verify and edit content received from contributors.

The Editors of Encyclopaedia Britannica

Article History

Also called:: Yule-Simpson effect

Related Topics:: statistics

See all related content →

Simpson’s paradox, in statistics, an effect that occurs when the marginal association between two categorical variables is qualitatively different from the partial association between the same two variables after controlling for one or more other variables. Simpson’s paradox is important for three critical reasons. First, people often expect statistical relationships to be immutable. They often are not. The relationship between two variables might increase, decrease, or even change direction depending on the set of variables being controlled. Second, Simpson’s paradox is not simply an obscure phenomenon of interest only to a small group of statisticians. Simpson’s paradox is actually one of a large class of association paradoxes. Third, Simpson’s paradox reminds researchers that causal inferences, particularly in nonexperimental studies, can be hazardous. Uncontrolled and even unobserved variables that would eliminate or reverse the association observed between two variables might exist.

Illustration

Understanding Simpson’s paradox is easiest in the context of a simple example. Suppose that a university is concerned about sex bias during the admission process to graduate school. To study this, applicants to the university’s graduate programs are classified based on sex and admissions outcome. These data would seem to be consistent with the existence of a sex bias because men (40 percent were admitted) were more likely to be admitted to graduate school than women (25 percent were admitted).

To identify the source of the difference in admission rates for men and women, the university subdivides applicants based on whether they applied to a department in the natural sciences or to one in the social sciences and then conducts the analysis again. Surprisingly, the university finds that the direction of the relationship between sex and outcome has reversed. In natural science departments, women (80 percent were admitted) were more likely to be admitted to graduate school than men (46 percent were admitted); similarly, in social science departments, women (20 percent were admitted) were more likely to be admitted to graduate school than men (4 percent were admitted).

Although the reversal in association that is observed in Simpson’s paradox might seem bewildering, it is actually straightforward. In this example, it occurred because both sex and admissions were related to a third variable, namely, the department. First, women were more likely to apply to social science departments, whereas men were more likely to apply to natural science departments. Second, the acceptance rate in social science departments was much less than that in natural science departments. Because women were more likely than men to apply to programs with low acceptance rates, when department was ignored (i.e., when the data were aggregated over the entire university), it seemed that women were less likely than men to be admitted to graduate school, whereas the reverse was actually true. Although hypothetical examples such as this one are simple to construct, numerous real-life examples can be found easily in the social science and statistics literatures.

Britannica QuizNumbers and Mathematics

Definition

Consider three random variables X, Y, and Z. Define a 2 × 2 × K cross-classification table by assuming that X and Y can be coded either 0 or 1, and Z can be assigned values from 1 to K.

The marginal association between X and Y is assessed by collapsing across or aggregating over the levels of Z. The partial association between X and Y controlling for Z is the association between X and Y at each level of Z or after adjusting for the levels of Z. Simpson’s paradox is said to have occurred when the pattern of marginal association and the pattern of partial association differ.

Special 67% offer for students! Finish the semester strong with Britannica.

Learn More

Various indices exist for assessing the association between two variables. For categorical variables, the odds ratio and the relative risk ratio are the two most common measures of association. Simpson’s paradox is the name applied to differences in the association between two categorical variables, regardless of how that association is measured.

Association Paradoxes

Association paradoxes, of which Simpson’s paradox is a special case, can occur between continuous (a variable that can take any value) or categorical variables (a variable that can take only certain values). For example, the best-known measure of association between two continuous variables is the correlation coefficient. It is well known that the marginal correlation between two variables can have one sign, whereas the partial correlation between the same two variables after controlling for one or more additional variables has the opposite sign.

Reversal paradoxes, in which the marginal and partial associations between two variables have different signs, such as Simpson’s paradox, are the most dramatic of the association paradoxes. A weaker form of association paradox occurs when the marginal and partial associations have the same sign, but the magnitude of the marginal association falls outside of the range of values of the partial associations computed at individual levels of the variable(s) being controlled. These have been termed amalgamation or aggregation paradoxes.

Simpson’s paradox | Definition, Example, and Explanation (2024)

Illustration

Definition

Association Paradoxes

References