Linear regression
In a nutshell
When two variables are linearly correlated (visible in a scatter diagram), it is possible to draw a line of best fit to all the points. An example is the least squares regression line.
Least squares regression line
This is normally just called the regression line and consists of a line which minimises the sum of the squares of the distances of each point from the line.
The regression line might not include the points from the scatter diagram and it is defined by y=a+bx, where:
- The slope b will have a positive value when the variables are positively correlated.
- The slope b will have a negative value when the variables are negatively correlated.
Note: By having the regression line and a value for the independent variable, you can find the corresponding value for the dependent variable. If the independent variable value is inside the data range, then you are interpolating. If it's not contained in that range, you are extrapolating (which is less reliable).
Example 1
Consider the following scatter diagram, with a regression line defined by y=1.14x+0.01.
Interpret the gradient of this regression line.
Find y when x=23. Is this an interpolation or an extrapolation?
When being asked to make an interpretation, always answer using the context of the question. In this case, there is no context, so simply state the meaning of the gradient:
When x values increase by 1, the y values increase by 1.14.
To find y when x=23, use the regression equation:
y=1.14×23+0.01=26.23
This is an extrapolation, because the x value is outside the data range.