Source codeVideos

Command Palette

Search for a command to run...

Statistics

Product Moment Correlation

What Is Product Moment Correlation?

Product Moment Correlation, often called Pearson Correlation or simply denoted by rr, is the most commonly used statistical measure to determine how strong and in what direction the linear relationship (straight-line pattern) is between two quantitative variables (numbers).

The value of rr tells us whether the two variables tend to move in the same direction (positive), opposite directions (negative), or if there is no linear relationship at all.

Correlation from Scatter Diagrams

The most intuitive way to understand the value of rr is by looking at how the data points are scattered on a diagram:

Strong Positive Correlation

rr approaching +1 means both variables tend to move in the same direction.

Example of Strong Positive Correlation
Data points cluster very closely forming an upward-sloping straight line pattern.

If your data points look like this (rising from bottom left to top right and tightly clustered), the rr value will be close to +1.

Weak Positive Correlation

rr being positive but close to 0 means both variables tend to move in the same direction, but not very strongly.

Example of Weak Positive Correlation
Points tend to rise, but are more spread out from the straight line.

If the points still show an upward trend but are more scattered like this, the rr value is positive but smaller (closer to 0).

Strong Negative Correlation

rr approaching -1 means both variables tend to move in opposite directions.

Example of Strong Negative Correlation
Data points cluster very closely forming a downward-sloping straight line pattern.

If the points fall from top left to bottom right and are very tightly clustered, the rr value will be close to -1.

No Linear Correlation

rr approaching 0 means the two variables have no linear relationship.

Example of No Linear Correlation
Data points are scattered randomly without forming a straight line pattern.

When the points are scattered randomly without a clear linear pattern, the rr value will be close to 0.

How is r Calculated?

The Pearson correlation coefficient (rr) essentially measures how synchronously two variables (X and Y) move relative to their own variations.

Imagine this:

  1. Individual Variation:

    Each variable (X and Y) has its own variability. Some values fluctuate a lot (large variation), while others are stable (small variation). This is measured by SSxxSS_{xx} for X and SSyySS_{yy} for Y (formulas below).

  2. Joint Variation (Covariance):

    We also need to know how X and Y vary together. When X increases, does Y also tend to increase? Or decrease? This measure of joint variation is called covariance, calculated using SSxySS_{xy}.

    • If SSxySS_{xy} is large and positive: X and Y often move in the same direction.
    • If SSxySS_{xy} is large and negative: X and Y often move in opposite directions.
    • If SSxySS_{xy} is close to zero: No clear pattern of joint movement.
  3. Standardizing the Measure:

    The problem is that the value of SSxySS_{xy} (covariance) is heavily influenced by the units of the data. For example, the covariance between height (cm) and weight (kg) will have a different value if we measure height in meters and weight in grams, even if the relationship is the same.

    To overcome this, we need to standardize the covariance measure. This is done by dividing the covariance (SSxySS_{xy}) by a measure of the individual variations (adjusted using square roots: SSxxSSyy\sqrt{SS_{xx} SS_{yy}}).

    r=How much X and Y vary togetherStandardized measure of individual X and Y variations=SSxySSxxSSyyr = \frac{\text{How much X and Y vary together}}{\text{Standardized measure of individual X and Y variations}} = \frac{SS_{xy}}{\sqrt{SS_{xx} SS_{yy}}}

The result of this division is rr, the Pearson Correlation Coefficient. Because it's standardized, its value will always be between -1 and +1, regardless of the original data units. This allows us to compare the strength of linear relationships between different pairs of variables.

So, the value of rr is determined by comparing how strongly X and Y move together relative to how much they move individually.

Product Moment Correlation Formula

To calculate the value of rr precisely, we use formulas involving the Sum of Squares:

r=SSxySSxxSSyyr = \frac{SS_{xy}}{\sqrt{SS_{xx} SS_{yy}}}

What are SSxySS_{xy}, SSxxSS_{xx}, and SSyySS_{yy}?

These measure how varied our data is:

  1. SSxxSS_{xx} (Sum of Squares for x): Measures how spread out the x data is from its mean.

    SSxx=(xxˉ)2=x2(x)2nSS_{xx} = \sum (x - \bar{x})^2 = \sum x^2 - \frac{(\sum x)^2}{n}
  2. SSyySS_{yy} (Sum of Squares for y): Measures how spread out the y data is from its mean.

    SSyy=(yyˉ)2=y2(y)2nSS_{yy} = \sum (y - \bar{y})^2 = \sum y^2 - \frac{(\sum y)^2}{n}
  3. SSxySS_{xy} (Sum of Products of deviations for x and y): Measures how x and y vary together.

    SSxy=(xxˉ)(yyˉ)=xy(x)(y)nSS_{xy} = \sum (x - \bar{x})(y - \bar{y}) = \sum xy - \frac{(\sum x)(\sum y)}{n}

Key:

  • nn: Number of data pairs (x, y).
  • x\sum x, y\sum y: Sum of all x and y values.
  • x2\sum x^2, y2\sum y^2: Sum of the squares of each x and y value.
  • xy\sum xy: Sum of the product of each x and y pair.
  • xˉ\bar{x}, yˉ\bar{y}: Mean of x and y values.

By calculating these three SS values and plugging them into the formula for rr, we get the Product Moment Correlation Coefficient.

Interpreting the Value of r

Once we have the value of rr, we can interpret its strength and direction using the following general guidelines:

Value of rrCorrelation StrengthDescription
11Perfect PositiveAll points lie exactly on an upward sloping line.
0.7r<10.7 \le r < 1Strong PositiveClear and strong positive linear relationship.
0.3<r<0.70.3 < r < 0.7Moderate PositiveModerately visible positive linear relationship.
0<r0.30 < r \le 0.3Weak PositiveVery low positive linear relationship.
00No Linear CorrelationNo linear relationship at all.
0.3r<0-0.3 \le r < 0Weak NegativeVery low negative linear relationship.
0.7<r<0.3-0.7 < r < -0.3Moderate NegativeModerately visible negative linear relationship.
1<r0.7-1 < r \le -0.7Strong NegativeClear and strong negative linear relationship.
1-1Perfect NegativeAll points lie exactly on a downward sloping line.