Box plots
In a nutshell
Box plots are diagrams which represent the spread and location of data. Box plots can be used to efficiently compare two or more data sets.
Features of a box plot
Before drawing a box plot, certain features of a data set need to be defined and calculated. This includes the quartiles, median, maximum, minimum and outliers of a given data set. The table below defines and shows the formulae to calculate each of these features where applicable for a discrete data set with n entries.
feature | Definition | Formula |
Lower quartile (Q1) | The value under which 25% of data points are found.
| 4n+1th point |
Median (Q2) | The value under which 50% of the data points are found. | 2n+1th point |
Upper quartile (Q3) | The value under which 75% of the data points are found. | 43(n+1)th point
|
Interquartile range (IQR) | The size of the range which contains the middle 50% of the data points. | Q3−Q1 |
Outliers | Extreme values which lie outside the normal trend of a data set which is determined by the constant, k.
The constant k will be given or implied in a question. | Greater than Q3+k(IQR) |
Less than Q1−k(IQR) |
Maximum | Highest value of data set which is not an outlier or at the boundary of an outlier. | |
Minimum | Lowest value of data set which is not an outlier or at the boundary of an outlier. | |
Note: When calculating quartiles or the median, if the data point calculated ends in .5 then the value of the given quartile will be halfway between the data points above and below it. If the data point calculated is any other decimal number, round up to the next data point.
Example 1
Olivia is collecting data about the ages of her friends' siblings. The answers she collects are 1,1,2,2,3,4,5,5,5,5,7,7,7,7,8. Given that there are no outliers, find the maximum, minimum, quartiles and median of ages collected.
Lower quartile | Position of Q1=4n+1=415+1=4th data point
4th value =2 |
Upper quartile | Position of Q3=43(n+1)=43(15+1)=12th data point
1212th value =7 |
Median | Q2=2n+1=215+1=8th data point
8th value =5 |
Interquartile range | IQR=UQ−LQ=7−2=5 |
Maximum | 8 |
Minimum | 1 |
Drawing a box plot
Drawing a box plot involves using an appropriate scale to mark all the features of a data set as defined above.
procedure
1. | Draw an appropriate scale and label it. |
2. | Mark Q1, Q2 and Q3 with vertical lines of an equal length. Use Q1 and Q3 to draw a box which is separated by Q2. |
3. | Mark on the maximum and minimum with equally sized vertical lines and connect to the box with a line. |
4. | Use a cross (x) to denote any outliers beyond the minimum or maximum values. |
Example 2
Draw a box plot to represent Olivia's data set.
Note: Sometimes outliers and quartiles will be given but not the maximum or minimum values of a data set. In this case, the effective maximum and minimum will respectively be the largest and smallest values which are not outliers.
Comparing box plots
Box plots can be used to compare the spread of data. Use the context of the question to compare the measure of location and the measure of spread. A measure of location is usually compared using the medians and spread is usually compared using the interquartile range and range.
Example 3
The two box plots below summarise the A-level marks in 2021 and 2022 in a given school. Without calculating individual values, compare the box plots and give your interpretation.
Compare the plots:
The median mark for 2022 is slightly lower than the median mark for 2021. The interquartile range and range for 2022 is less than the range for 2021.
Interpret the data:
In 2021, a larger proportion of students did better in their A-levels than in 2022. But, students in 2022 performed much more similarly with a tighter spread of results.
Note: Sometimes the box plots will be on different scales. Ensure to compare them against the same scale.