Everything to learn better...

Outliers

Select Lesson

Exam Board

Select an option

Explainer Video

Loading...
Tutor: Labib

Summary

Outliers

​​In a nutshell

Outliers are extreme values outside the trend of a data set. They can be calculated in different ways.


Variable definitions

Variable Name

symbol

Upper quartile​​

Q3Q_3​​

Lower quartile​​

Q1Q_1​​

​Constant​

kk​​


​​Equations

Description

Equation

Outliers that are beyond the maximum.

Q3+k(Q3Q1)Q_3 + k(Q_3-Q_1)​​

Outliers that are below the minimum.

Q1k(Q3Q1)Q_1 - k(Q_3-Q_1)​​

The interquartile range. 

Q3Q1Q_3 -Q_1​​




Finding outliers

Use the given equations to find outliers from a data set. The kk value will be provided or implied within the question. ​

Q3+k(Q3Q1)   and   Q1k(Q3Q1)\boxed{Q_3 + k(Q_3-Q_1) \ \ \ and \ \ \ Q_1 - k(Q_3-Q_1)}​​


Example 1

The mass of 3030 bunnies is recorded. The results in kgkg are shown below:

​​​1.6, 2.3, 2.4, 2.4, 2.6, 2.7, 2.9, 3.1, 3.7, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9,4.1, 4.1, 4.1, 4.3, 4.6, 4.8, 5.0, 5.0, 5.0, 5.2, 5.2, 5.3, 5.4, 5.5, 5.81.6 , \ 2.3, \ 2.4, \ 2.4 ,\ 2.6, \ 2.7, \ 2.9, \ 3.1, \ 3.7, \ 3.8, \ 3.8, \ 3.8, \ 3.9, \ 3.9, \ 3.9, \\ 4.1, \ 4.1, \ 4.1, \ 4.3, \ 4.6, \ 4.8, \ 5.0, \ 5.0, \ 5.0, \ 5.2, \ 5.2, \ 5.3, \ 5.4 , \ 5.5, \ 5.8​​


An outlier is observed at 1.5×interquartile range1.5 \times \text{interquartile range} above the upper quartile or 1.5×interquartile range1.5 \times \text{interquartile range} below the lower quartile. Does the data set have any outliers?​


Find the lower quartile which a quarter of the way through the data points:

30×14=7.58th term30 \times \dfrac1 4 = 7.5 \rightarrow 8^{th} \ \textit{term}

​​

8th term=3.18^{th} \ \textit{term} = 3.1​​


Find the upper quartile which three quarters of the way through the data points:

30×34=22.523rd term30 \times \dfrac34 = 22.5 \rightarrow 23^{rd} \ \textit{term}

​​

23rd term=5.023^{rd} \ \textit{term} = 5.0


Find the interquartile range:

5.03.1=1.95.0 - 3.1 = 1.9​​


Outliers are values less than:

3.1(1.5×1.9)=0.253.1 - (1.5 \times 1.9) = 0.25


Outliers are values greater than:

5.0+(1.5×1.9)=7.855.0 + (1.5 \times 1.9) = 7.85​​

There are no outliers in the data set. 


Note: When calculating the position of a given quartile always round up.



Cleaning data

Questions may use different definitions for outliers. This may involve the application of different equations like those for mean and standard deviation. Once outliers are identified anomalies can then be removed from a data set with a justifiable reason. This is known as cleaning data.  


​​Example 2

The height of 1010​ tigers are given in the data set below in metres:

1.2, 1.5, 3.1, 3.2, 3.5, 3.7, 4.0, 4.1, 5.3, 8.41.2, \ 1.5, \ 3.1, \ 3.2, \ 3.5, \ 3.7, \ 4.0, \ 4.1, \ 5.3, \ 8.4​​


Two sums are also given: 

(x)=38(x2)=180.94\sum (x) = 38 \quad \quad \sum(x^2) = 180.94​​


Outliers are more than a standard deviation away from the mean. Find the outliers and use this information to clean the data set, justified with reasons. 


Calculate the mean:

Mean=x=xn=3810=3.8\text{Mean} =\overline{x}= \dfrac{\sum x}{n} = \dfrac{38}{10} = 3.8


Calculate the standard deviation:

Variance=x2nx2=180.94103.82=3.654\text{Variance} = \dfrac {\sum x^2}{n} - \overline{x}^2 = \dfrac {180.94}{10} - 3.8^2 = 3.654


Variance=Standard deviation=3.654=1.9 (1 d.p.)\sqrt{\text{Variance}} = \text{Standard deviation} = \sqrt{3.654} = 1.9\ (1 \ d.p.)


Find the outliers:

smaller than 3.81.9=1.93.8 - 1.9 = 1.9​​

bigger than 3.8+1.9=5.73.8+1.9 = 5.7​​


Hence the outliers are 1.2, 1.5, 8.4\underline{1.2, \ 1.5, \ 8.4}.


Identify the anomaly with a reason: 

1.21.2 and 1.51.5 may not be anomalous as it is possible these are juvenile tigers. 

It is highly unlikely a tiger is 8.4 m8.4 \ m because this is a significantly larger than x+σ\overline{x}+\sigma with an almost 3 m3\ m  difference.


Rewrite the cleaned data set: 

1.2, 1.5, 3.1, 3.2, 3.5, 3.7, 4.0, 4.1, 5.3\underline{1.2, \ 1.5, \ 3.1, \ 3.2, \ 3.5, \ 3.7, \ 4.0, \ 4.1, \ 5.3}​​





Create an account to read the summary

Exercises

Create an account to complete the exercises

FAQs - Frequently Asked Questions

What is the most common equation for outliers?

What is cleaning data?

What is an outlier?

Beta

I'm Vulpy, your AI study buddy! Let's study together.