Source codeVideos

Command Palette

Search for a command to run...

Statistics

Variance and Standard Deviation for Grouped Data

Calculating Spread for Grouped Data

How do we measure the spread of data presented in a grouped frequency table? For example, data on phone battery duration grouped into hour intervals (6-10 hours, 11-15 hours, etc.).

Since we don't know the exact value of each data point within a class interval (e.g., in the 11-15 hours class, we don't know if the duration was exactly 11 hours, 12 hours, or something else), we need to make an assumption.

The most common assumption is that all data within a class interval are evenly distributed. Therefore, we can represent all data in that class using the midpoint (xix_i) of that class.

Formulas for Variance and Standard Deviation of Grouped Data

Using the midpoint (xix_i) and frequency (ff) of each class, the formulas are slightly different:

  1. Variance (σ2\sigma^2) The commonly used (and easier to compute) formula is the computational formula adapted for grouped data:

    σ2=(fxi2)f((fxi)f)2\sigma^2 = \frac{\sum (f \cdot x_i^2)}{\sum f} - \left( \frac{\sum (f \cdot x_i)}{\sum f} \right)^2

    This formula essentially calculates the average of the squared midpoints weighted by frequency, minus the square of the average midpoint weighted by frequency (the mean of the grouped data).

  2. Standard Deviation (σ\sigma) Just like with ungrouped data, the standard deviation is the square root of the variance:

    σ=σ2\sigma = \sqrt{\sigma^2}

Calculating Variance and Standard Deviation of Phone Battery Duration

Suppose a study on phone battery duration yielded the following data:

Battery duration (hours)Frequency (ff)
6-102
11-1510
16-2018
21-2545
26-305

Let's determine the variance and standard deviation for this battery duration data.

Create a Helper Table

We need to calculate the midpoint (xix_i) for each class, then compute fxif \cdot x_i and fxi2f \cdot x_i^2.

Battery duration (hours)Midpoint, xix_iFrequency, fffxif \cdot x_ifxi2f \cdot x_i^2
6-106+102=8\frac{6+10}{2}=822×8=162 \times 8 = 162×82=1282 \times 8^2 = 128
11-1511+152=13\frac{11+15}{2}=131010×13=13010 \times 13 = 13010×132=169010 \times 13^2 = 1690
16-2016+202=18\frac{16+20}{2}=181818×18=32418 \times 18 = 32418×182=583218 \times 18^2 = 5832
21-2521+252=23\frac{21+25}{2}=234545×23=103545 \times 23 = 103545×232=2380545 \times 23^2 = 23805
26-3026+302=28\frac{26+30}{2}=2855×28=1405 \times 28 = 1405×282=39205 \times 28^2 = 3920
Totalf=80\sum f = 80fxi=1645\sum fx_i = 1645fxi2=35375\sum fx_i^2 = 35375

Calculate Variance

Plug the total values from the table into the variance formula:

σ2=(fxi2)f((fxi)f)2\sigma^2 = \frac{\sum (f \cdot x_i^2)}{\sum f} - \left( \frac{\sum (f \cdot x_i)}{\sum f} \right)^2
σ2=3537580(164580)2\sigma^2 = \frac{35375}{80} - \left( \frac{1645}{80} \right)^2
σ2=442.1875(20.5625)2\sigma^2 = 442.1875 - (20.5625)^2
σ2=442.1875422.81640625\sigma^2 = 442.1875 - 422.81640625
σ219.37\sigma^2 \approx 19.37

So, the variance of the battery duration data is approximately 19.37 (in units of hours squared).

Calculate Standard Deviation

Take the square root of the variance:

σ=19.374.4\sigma = \sqrt{19.37} \approx 4.4

The standard deviation of the phone battery duration is approximately 4.4 hours. This gives us an idea that the average deviation of battery duration from the mean (which can be calculated as 16458020.56\frac{1645}{80} \approx 20.56 hours) is about 4.4 hours.

The smaller the standard deviation, the more uniform the phone battery durations were in the study.