An outlier is observed at 1.5×interquartile range above the upper quartile or 1.5×interquartile range below the lower quartile. Does the data set have any outliers?
Find the lower quartile which a quarter of the way through the data points:
30×41=7.5→8thterm
8thterm=3.1
Find the upper quartile which three quarters of the way through the data points:
30×43=22.5→23rdterm
23rdterm=5.0
Find the interquartile range:
5.0−3.1=1.9
Outliers are values less than:
3.1−(1.5×1.9)=0.25
Outliers are values greater than:
5.0+(1.5×1.9)=7.85
There are no outliers in the data set.
Note: When calculating the position of a given quartile always round up.
Cleaning data
Questions may use different definitions for outliers. This may involve the application of different equations like those for mean and standard deviation. Once outliers are identified anomalies can then be removed from a data set with a justifiable reason. This is known as cleaning data.
Example 2
The height of 10 tigers are given in the data set below in metres:
1.2,1.5,3.1,3.2,3.5,3.7,4.0,4.1,5.3,8.4
Two sums are also given:
∑(x)=38∑(x2)=180.94
Outliers are more than a standard deviation away from the mean.Find the outliers and use this information to clean the data set, justified with reasons.
Calculate the mean:
Mean=x=n∑x=1038=3.8
Calculate the standard deviation:
Variance=n∑x2−x2=10180.94−3.82=3.654
Variance=Standard deviation=3.654=1.9(1d.p.)
Find the outliers:
smaller than 3.8−1.9=1.9
bigger than 3.8+1.9=5.7
Hence the outliers are 1.2,1.5,8.4.
Identify the anomaly with a reason:
1.2 and 1.5 may not be anomalous as it is possible these are juvenile tigers.
It is highly unlikely a tiger is 8.4m because this is a significantly larger than x+σ with an almost 3m difference.
Rewrite the cleaned data set:
1.2,1.5,3.1,3.2,3.5,3.7,4.0,4.1,5.3
Read more
Learn with Basics
Learn the basics with theory units and practise what you learned with exercise sets!
Length:
Unit 1
Measures of location: Quartiles, percentiles, deciles
Unit 2
Measures of spread: Range
Jump Ahead
Score 80% to jump directly to the final unit.
Optional
This is the current lesson and goal (target) of the path
Unit 3
Outliers
Final Test
Test reviewing all units to claim a reward planet.
Create an account to complete the exercises
FAQs - Frequently Asked Questions
What is the most common equation for outliers?
There are two common equation which are used. Values with are greater than: Lower quartile - k(Interquartile range). Values which are less than: Upper quartile + k(interquartile range).
What is cleaning data?
Removing anomalies from a data set.
What is an outlier?
Outliers are extreme values outside the trend of a data set.