Source codeVideos

Command Palette

Search for a command to run...

Statistics

Coefficient of Determination

What is the Coefficient of Determination?

After finding the best-fit linear regression line for our data, the next question is: how well does that line actually represent or explain our data?

The measure that answers this question is the Coefficient of Determination, denoted as r2r^2 (read: r-squared).

Simply put, r2r^2 tells us the proportion or percentage of the variation (ups and downs in values) in the dependent variable (Y) that can be explained by the variation in the independent variable (X) using our linear regression model.

Coefficient of Determination from a Scatter Diagram

The value of r2r^2 is closely related to how tightly the data points cluster around the regression line:

  1. High r2r^2 (approaching 1 or 100%)

    High r2r^2
    Data points are very close to the regression line.

    See how the data points above are very tightly packed and close to the regression line? This indicates a high r2r^2 value (for example, maybe around 0.95 or 95%). This means that most of the variation in Y values can be explained well by the regression line (or by variable X).

  2. Low r2r^2 (approaching 0 or 0%)

    Low r2r^2
    Data points are scattered far from the regression line.

    Compare this with this diagram. The points are more spread out from the regression line (the residual lines are longer). This indicates a low r2r^2 value (for example, maybe around 0.40 or 40%). This means that this regression line is not very good at explaining the variation in Y values; only a small portion of the variation in Y can be explained by X through this model.

Calculating the Coefficient of Determination

The easiest way to calculate r2r^2 is by squaring the Correlation Coefficient (rr) that we learned about earlier.

r2=(r)2r^2 = (r)^2

So, if you've already calculated the value of rr, just square it!

Since the value of rr is always between -1 and +1 (1r1-1 \le r \le 1), the value of r2r^2 will always be between 0 and 1.

0r210 \le r^2 \le 1

Mathematically (using Sum of Squares):

The value of r2r^2 can also be calculated directly using the Sum of Squares values used to calculate rr:

r2=(SSxy)2SSxxSSyyr^2 = \frac{(SS_{xy})^2}{SS_{xx} SS_{yy}}

Interpretation as a Percentage

The value of r2r^2 is often converted into a percentage (by multiplying by 100) for easier interpretation.

  • If r2=0.81r^2 = 0.81, it means that 81% of the total variation in variable Y can be explained by the variation in variable X through the linear regression model.
  • The remaining variation (1r21 - r^2 or 19% in this example) is explained by other factors not included in the model (could be other variables, or random error).

The higher the percentage of r2r^2, the better our linear regression model is at explaining the relationship between X and Y.