Source codeVideos

Command Palette

Search for a command to run...

Statistics

Quartiles for Grouped Data

How to Find Quartiles in Grouped Data

For single data, we just sort it and find the middle position. Now, if the data is grouped in a frequency table (like test scores grouped as 70-79, 80-89, etc.), the method is slightly different. We don't know the exact value of each data point, only how many data points are in each group (class interval).

Similar to the median for grouped data, to find quartiles (Q1Q_1, Q2Q_2, Q3Q_3), we also use interpolation. Essentially, we "estimate" the quartile's position within the class interval where it falls.

We determine the position of the quartile using this formula:

  • Position of Q1Q_1 = the 14n\frac{1}{4}n-th data point
  • Position of Q2Q_2 = the 24n\frac{2}{4}n-th data point (or 12n\frac{1}{2}n-th)
  • Position of Q3Q_3 = the 34n\frac{3}{4}n-th data point

Where nn is the total number of data points.

Steps to Find the Value of Quartiles for Grouped Data

Let's assume we have shoe sales data from Store A in a grouped frequency table format.

Create a Cumulative Frequency Table

First, we need a frequency table with a cumulative frequency column (FkF_k). Cumulative frequency is the sum of frequencies from the first class up to that class. This is important to know which class the quartile falls into.

For example, here is the shoe sales table:

Shoe SizeFrequency (ff)Cumulative Frequency (FkF_k)Lower Boundary (TbT_b)Upper Boundary (TaT_a)Class Width (pp)
37-392236.5\leq 36.539.5\leq 39.53
40-42111339.5\leq 39.542.5\leq 42.53
43-45102342.5\leq 42.545.5\leq 45.53
46-4852845.5\leq 45.548.5\leq 48.53
49-5123048.5\leq 48.551.5\leq 51.53
Total30

Lower boundary = lower limit - 0.5

Upper boundary = upper limit + 0.5

Class width = Upper boundary - Lower boundary

Determine the Quartile Class Position

First, let's find the position of the data point for the quartile.

Total data (nn) = 30.

  • Position of Q1Q_1: the 14×30=7.5\frac{1}{4} \times 30 = 7.5-th data point.

    Look at the FkF_k column. Which class contains the 7.5th data point? The first class has Fk=2F_k = 2 (not enough). The second class has Fk=13F_k = 13 (data points 3 through 13 are here). So, the 7.5th data point is in the 40-42 class.

  • Position of Q2Q_2 (Median): the 12×30=15\frac{1}{2} \times 30 = 15-th data point.

    Look at FkF_k. The 15th data point is in the 43-45 class (because the previous FkF_k was 13, and this class's FkF_k is 23).

  • Position of Q3Q_3: the 34×30=22.5\frac{3}{4} \times 30 = 22.5-th data point.

    Look at FkF_k. The 22.5th data point is also in the 43-45 class (because the previous FkF_k was 13, and this class's FkF_k is 23).

Calculate the Quartile Value using the Interpolation Formula

Once we know the class, we use this formula to find the exact value:

Qi=Tb+(i4nFkumfi)pQ_i = T_b + \left( \frac{\frac{i}{4}n - F_{kum}}{f_i} \right) p

Where:

  • QiQ_i = Value of the i-th Quartile (what we're looking for)
  • TbT_b = Lower boundary of the i-th quartile class
  • nn = Total frequency
  • FkumF_{kum} = Cumulative frequency BEFORE the i-th quartile class
  • fif_i = Frequency of the i-th quartile class
  • pp = Class width

Finding Q1 for Shoe Sales

Let's calculate Q1Q_1 from the table above.

  1. Position of Q1Q_1: 7.5th data point.

  2. Class of Q1Q_1: 40-42.

  3. Let's gather the ingredients:

    • Lower boundary of Q1Q_1 class (TbT_b) = 39.5
    • Total data (nn) = 30
    • Cumulative frequency before Q1Q_1 class (FkumF_{kum}) = 2 (see FkF_k for class 37-39)
    • Frequency of Q1Q_1 class (f1f_1) = 11
    • Class width (pp) = 3
  4. Plug into the formula:

    Q1=Tb+(14nFkumf1)pQ_1 = T_b + \left( \frac{\frac{1}{4}n - F_{kum}}{f_1} \right) p
    Q1=39.5+(7.5211)3Q_1 = 39.5 + \left( \frac{7.5 - 2}{11} \right) 3
    Q1=39.5+(5.511)3Q_1 = 39.5 + \left( \frac{5.5}{11} \right) 3
    Q1=39.5+(0.5)×3Q_1 = 39.5 + (0.5) \times 3
    Q1=39.5+1.5Q_1 = 39.5 + 1.5
    Q1=41Q_1 = 41

So, the value of Q1Q_1 is 41. This means about 25% of the shoes sold are size 41 or smaller.

Exercise

Try calculating Q3Q_3 from the shoe sales data in the table above.

After getting the result, compare it with the method for finding quartiles for single data learned earlier. What's the difference, and why might the results be similar or different?

Answer Key

  1. Position of Q3Q_3: 22.5th data point.

  2. Class of Q3Q_3: 43-45.

  3. Gather the ingredients:

    • TbT_b = 42.5 (lower boundary of Q3Q_3 class)
    • nn = 30 (total data)
    • FkumF_{kum} = 13 (see FkF_k for class 40-42)
    • f3f_3 = 10 (frequency of Q3Q_3 class)
    • pp = 3 (class width)
  4. Plug into the formula:

    Q3=Tb+(34nFkumf3)pQ_3 = T_b + \left( \frac{\frac{3}{4}n - F_{kum}}{f_3} \right) p
    Q3=42.5+(22.51310)3Q_3 = 42.5 + \left( \frac{22.5 - 13}{10} \right) 3
    Q3=42.5+(9.510)3Q_3 = 42.5 + \left( \frac{9.5}{10} \right) 3
    Q3=42.5+(0.95)×3Q_3 = 42.5 + (0.95) \times 3
    Q3=42.5+2.85Q_3 = 42.5 + 2.85
    Q3=45.35Q_3 = 45.35

So, the value of Q3Q_3 is 45.35. This means about 75% of the shoes sold are size 45.35 or smaller (or 25% are sold in sizes larger than 45.35).

Comparison with Single Data:

Finding quartiles for grouped data uses interpolation because we don't know the exact value of each data point, only its range. The result is an estimated quartile value.

For single data, we can directly point to which data point is the quartile (or the average of two data points), so the result is more precise (if the data is indeed single). Quartiles for grouped data provide a good overview for large datasets that have already been grouped.