Scatter graphs
In a nutshell
Scatter graphs are used to determine whether or not there is a correlation between two variables. If there is a straight line of best fit that accurately matches the trend of the data points on the graph, then there is a correlation between the variables. This line can be used to make estimates of the data.
Correlation
Definition
Two variables are correlated if there is a relationship, or trend between them. This means that if one variable changes, so does the other in a similar way.
The following terms can be used to describe the correlation between variables:
Term | Description |
Strong | When the two variables are closely related, so the data closely matches the trend between the variables. |
Weak | When the two variables are not closely related, and so the data points do not closely match the trend between the variables. |
Positive | Correlation such that as one variable increases, so does the other. |
Negative | Correlation such that as one variable increases, the other variable decreases. |
Note: If there is no discernible pattern between the data points, this is classed as 'no correlation'.
Line of best fit
This is the straight line that most accurately matches up with the data points. As this is a straight line, you can work out the gradient. If this is positive, this means there is a positive correlation between the variables; if negative, it means there is a negative correlation between the variables. If there is no straight line possible such that the data matches up with the line, this means that there is no correlation between the variables.
Example
Plot a scatter graph of the following data of heights and weights of students and, if applicable, draw a line of best fit. Hence describe the correlation between the data.
As the line of best fit very closely matches with the data, this data set suggests a strong, positive correlation between the heights and weights of students.
Note: When drawing a line of best fit, the line should not join all the points together. Instead, draw a straight line that most accurately reflects the "trend" of the data. You should aim to have roughly the same number of points above and below the line of best fit.