Now if both statistical measures, the mean and the median, are used to describe the location of a set of data, what about advantages and disadvantages?
As mentioned above, the mean is the more commonly used measure of the two. Moreover, it is the basis of many advanced statistical methods.
For example, the mean is needed to calculate the standard deviation, which is the most prominent measure to assess the variability in a set of data. And it is also needed for many statistical testing procedures, e.g. for the t-test.
But then, what are the advantages of the median?
To illustrate this, we return to the five systolic blood pressure values used before:
142, 124, 121, 151, 132.
We assume that 151 is a correct value, but that a device failure leads to the false measurement of 171. Let’s see what happens to mean and median?
The mean of the resulting five values now is 138 instead of 134, as calculated from the original data, thus showing a considerable effect of the incorrect measurement.
To derive the median, we sort the data again by size:
121, 124, 132, 142, 171.
As before, the value 132 is in the centre of the data row, so the median actually is unaltered by the false measurement.
That is why the median is called “robust against outliers“, whereas the mean actually is “sensitive to outliers“.
Another advantage of the median, associated with this kind of robustness, can be seen in “skewed” distributions.
An example for such a distribution in the context of an observational study is the time since the onset of a particular disease. In many cases, the date of diagnosis is close to the time of reporting, i.e. at or just a few days prior to the baseline visit. However, the study group often also includes patients who have been suffering from the disease for many years.
If we calculate the mean of the individual time spans since disease onset, such large values have an enormous impact, making the mean larger than the actual distribution of data would suggest.
The good news is that the outliers don’t have such an effect on the median. Therefore, here the median gives a more realistic picture of the data.