Don’t panic: It’s only the Simpson’s Paradox

Simpson's Paradox

Eventually, it catches you cold: Your results on the overall patient level contradict results on the subgroup level – leaving you in confusion or even despair.

However, there is no need to panic. With a deeper look into the data, one can get to the bottom of this observation.

Consider the following example: In a clinical trial, the dose-response relationship of a drug should be evaluated. Statistical analysis led to the following results:

Dose-response correlation Gender Overall
Female Male
Pearson correlation coefficient r -0.49 -0.50 0.52


As expected, the results of the overall analysis population showed a positive dose-response effect (r = 0.52). That is, the higher the dose, the higher was the response. However, in a subgroup analysis of gender, the reverse association was found for each gender: The higher the dose, the lower was the response for both, female (r = -0.49) and male (r = -0.50).

What did happen here?

At the first sight, one might think, a rare curiosity may have arisen here. But this phenomenon is well known in science and called the Simpson’s Paradox.

The Simpson’s Paradox may arise if there is (at least) one confounding variable that has not been accounted for.

In our example, the factor gender influences the choice of drug dose as well as the response (as it is depicted in the figure below): Females took drugs with lower doses and were observed to respond less to the treatment compared to men. That is, the factor gender confounds the relationship between dose and response.

Simpson's Paradox


What can we do about it?

Non-adjusted results on the aggregated patient level simply do not convey the true and more complicated structure of the dose-response relationship in the population of interest. They are, hence, inadequate to be reported and lead to false conclusions.

Regression analyses techniques handle all kinds of confounding. Resulting association estimates (e.g. regression effects, odds ratios of logistic regression) are adjusted for confounding. This adjustment helps us to draw the right conclusions. In case of simple confounding structures with one or two confounding factors (like gender in our example), subgroup analyses (i.e. analyses conducted separately in subgroups) may be chosen.

In the current example, a regression model for response could be chosen – with dose as explanatory variable and gender as an additional confounding factor. The model estimation leads to appropriate negative correlation estimates (i.e. negative regression effects) for the dose-response relationship right away.

In summary, there is no need to despair when contradicting results on the aggregated population and subgroup level arise. However, it might be a challenging task to identify complicated confounding structures; A task that is most often accomplished in an interdisciplinary team of medical researchers and biostatisticians since the identification of confounding variables and techniques to handle them needs both, medical insight and statistical knowledge.


Picture: @Matthias Buehner /


Get the latest articles as soon as they are published: for practitioners in clinical research

  • Read about ideas & tools for effective clinical research

  • Follow today’s topics in clinical research

  • Knowledge base: study design, study management, digitalization & data management, biostatistics, safety

  • It’s free! Sign up now!

We use the Google service reCaptcha to determine whether a person or a computer makes a specific entry in our contact or newsletter form. Google uses the following information to determine if you are a human being or a computer: IP address of the terminal device you are using, the website you are visiting and on which the captcha is integrated, the date and duration of the visit, the identification data of the browser and operating system type used, Google account if you are logged in to Google, mouse movements on the reCaptcha areas and tasks for which you must identify images. The legal basis for the described data processing is Art. 6 para. 1 lit. f General Data Protection Regulation. There is a legitimate interest on our part in this data processing to ensure the security of our website and to protect us from automated input (attacks).