Intention-To-Treat (ITT) vs. Per-Protocol (PP) analysis: what to choose?
What is worse?
Indeed, from a patient’s perspective the answer might not be straightforward. However, there is a clear answer to this question in clinical research (quick spoiler: situation A is worse!).
Why asking these questions though when talking about intention-to-treat (ITT) vs. per-protocol (PP)? Well, let’s start with some definitions and general explanations:
The Intention-To-Treat (ITT) principle
The intention-to-treat principle defines that every patient randomized to the clinical study should enter the primary analysis. Accordingly, patients who drop out prematurely, are non-compliant to the study treatment, or even take the wrong study treatment, are included in the primary analysis within the respective treatment group they have been assigned to at randomization (“as randomized”).
Consequently, in an analysis according to the ITT principle, the original randomization and the number of patients in the treatment groups remain unchanged, the analysis population is as complete as possible, and a potential bias due to exclusion of patients is avoided. Thus, the patient set used for the primary analysis according to the ITT principle is called “full analysis set”.
There are only some specific reasons that might cause an exclusion of a patient from the full analysis set:
In addition, the ICH E9 guideline mentions “failure of major entry criteria” as a reason for exclusion. However, as these major entry criteria are quite specific and only valid under certain circumstances, they are not commonly used for the definition of a full analysis set.
The Per-Protocol (PP) principle
While an analysis according to the ITT principle aims to preserve the original randomization and to avoid potential bias due to exclusion of patients, the aim of a per-protocol (PP) analysis is to identify a treatment effect which would occur under optimal conditions; i.e. to answer the question: what is the effect if patients are fully compliant? Therefore, some patients (from the full analysis set) need to be excluded from the population used for the PP analysis (PP population).
Usually, this applies to patients fulfilling any of the following criteria:
There might be further criteria for selecting a PP population; however, the following approaches are essential:
Both approaches, the ITT and the PP approach, are valid but have different roles in the analysis of clinical studies. Let’s come back to the question at the beginning of this article: What is worse, scenario A (claim a non-existing effect) or B (neglect an existing effect)?
Errot type I and II
To answer this, consider the essential difference between the two cases:
Case A means that a statistically proven result is actually wrong – a result that might cause dangerous effects. Based on such a proof, an inefficacious treatment might be approved and patients put into danger. Situation B on the other hand means that efficacy was not proven but also not refused. However, the non-proven efficacy does not equal a proven inefficacy! From a scientific perspective, such a non-decision has less implications than a wrong proof.
Therefore, in clinical trials situation A (also known as type I error) is strictly controlled via a low pre-defined level of significance: a level of 5% e.g. says that (if there is actually no effect) the probability of situation A is only 5% or less. Situation B (known as type II error) on the other hand, is controlled via a meaningful sample size calculation, but usually with a less strict criterion (e.g. 20%).
Concluding, it is more essential to avoid a wrong proof than to avoid a wrong non-decision (which is also bad, but A is worse…). Consequently, it is essential to keep the probability of situation A below the level of significance (e.g. 5%).
Thus, the common rule for clinical trial analyses is: be conservative! While “conservative” means: do not increase the probability of a type I error!
What is the consequence for the choice of a patient analysis set?
In a clinical trial (we only talk about superiority trials here as the situation is different for non-inferiority trials), one wants to detect a benefit of treatment A (e.g. verum) compared to treatment B (e.g. placebo). The aim is to disprove that “treatment A is not better than treatment B (so-called “null hypothesis”). This is equivalent to a proof that “treatment A is actually better than treatment B” (that is the way statistical tests work).
Thus, a high treatment effect leads to a successful trial (i.e. to proven efficacy). However, if you choose a too optimistic method of analysis, i.e. if you over-estimate the effect, you receive more likely a positive result. Or in other words: you increase the probability of a type I error.
Therefore, in clinical trials any over-estimation of the effect needs to be avoided. With respect to prevention of type I error it is still better to choose a method which under-estimates the effect (conservative approach) than a method which might over-estimate it.
What does this general rule mean for the choice of ITT vs. PP? What is the more conservative approach in this context? The simple answer is: it’s the analysis according to the ITT principle.
For this kind of analysis, actual treatment effects usually are watered-down, or in other words: effects are under-estimated. This tendency is also described in common guidelines (e.g. ICH E9). It can be derived from the fact that in the full analysis set also non-compliant patients are included and non-compliance generally is associated with a negative outcome (e.g., patients who dropped out at a very early stage in the study usually have a negative outcome). Presumed that non-compliance occurs in all treatment arms, differences between the treatments consequently diminish.
Let’s have a look at a short example:
Consider a superiority trial with two treatment arms (verum vs. placebo), with a dichotomous outcome (response yes, no). The real response rates, i.e. the response rates that are expected, are 60% under verum and 40% under placebo; thus, there is a real treatment effect of 20% points.
Now assume that 10% of the patients in both study arms previously drop out from the study due to missing follow-up (i.e., 10% dropouts, 90% completers). Due to their shortened observation period, none of the dropouts achieved response (a reasonable assumption).
Nevertheless, according to the ITT principle, all patients (including dropouts) are included in the full analysis set. Let’s have a look at the outcome:
|90 Completers||60%, i.e. 54 Responders||54 of 100 patients are responders (54%)||->Effect: Δ=18%|
|10 Dropouts||0%, i.e. 0 Responders|
|90 Completers||40%, i.e. 36 Responders||36 of 100 patients are responders (36%)|
|10 Dropouts||0%, i.e. 0 Responders|
The estimated treatment effect in this analysis is 18% points, i.e. the actual treatment difference of 20% points is under-estimated. However, with respect to the aim to not increase the probability of a type I error, this “wrong” (or conservative) estimation is still better than an over-estimation of the effect.
How about the PP analysis in this context? Exclusion of patients from the analysis due to major protocol deviations can of course also cause a tendency to wrong estimations of a treatment effect. This is particularly the case, if the frequency of and the reasons for exclusion vary between the study groups. However, for a PP analysis it is not straightforward to pre-guess the direction of a wrong estimation (over- or under-estimation). Some authors and guidelines claim a tendency of PP analyses to over-estimate an effect (e.g. ICH E9 guideline) although this cannot be derived mathematically.
In summary, the ITT approach that tends to under-estimate an effect is the more conservative approach in a clinical (superiority) trial. Following the general analysis rule above (stay conservative!), the ITT population is the method of choice for the primary analysis.
Nevertheless, a PP approach is of course a reasonable analysis strategy for sensitivity analyses. In any case, if within a trial the results of the ITT and the PP analysis differ considerably, this is always a reason to start asking unpleasant questions.