MathJax

Monday, September 9, 2013

Five-number summary

If you see me write something that looks like this:
 
  1  -[  4  22  70  ]-  296
 
you're looking at a five-number summary. This is a handy way of summarising a numerical data set. Traditionally, the values are

  minimum -[ 25th percentile | median | 75 percentile ]- maximum

The interpretation of the summary above is as follows.
  • The typical data (the values in the middle) lie between 4 and 70 (between the 25th and 75th percentiles).
  • The median, 22, is the value that divides the bottom half and top half of the data. So 50% of the data lie below 22 and 50% of the data lie above 22.
  • The minimum value is 1.
  • The maximum values is 296.
I prefer replacing the minimum with the 1st percentile and the maximum with the 99th percentile since these are more stable (less sensitive to outliers in the data set).

1st perc.  -[  25th perc.  |  median  |  75th perc.  ]-  99th perc.