What Is a Scatter Diagram?
A Scatter Diagram is like a map that shows the relationship between two types of data. For example, we might want to see the relationship between study time (X-axis) and exam scores (Y-axis).
Each point on the diagram represents one pair of data (e.g., one student's data). By looking at the pattern of the points, we can understand their relationship.
When Should a Scatter Diagram Be Used?
A Scatter Diagram is most suitable when we want to:
- See if there is a relationship (correlation) between two numerical variables (numbers). (Example: the relationship between height and weight, or study time and scores.)
- See the pattern of that relationship (whether it's positive, negative, or no pattern).
This differs from other diagrams:
- Bar Chart: Good for comparing quantities or values between categories (e.g., number of students per class).
- Line Chart: Good for seeing trends in data over time or a specific sequence (e.g., daily temperature changes).
- Pie Chart: Good for showing proportions or parts of a whole (e.g., percentage of favorite fruit types).
So, if your main focus is seeing the relationship between two sets of numbers, a scatter diagram is the right choice!
Scatter Diagram Examples and Correlation Patterns
Let's look at some examples of scatter diagrams with different patterns:
Positive Correlation
If the points tend to rise from the bottom left to the top right, it means there is a positive correlation. As the value of X increases, the value of Y also tends to increase.
Negative Correlation
If the points tend to fall from the top left to the bottom right, it means there is a negative correlation. As the value of X increases, the value of Y tends to decrease.
No Correlation (with 2 Groups)
If the points are scattered randomly without a clear pattern, it means there is no correlation or the correlation is very weak. We can also display different groups in one diagram.
So, by looking at the distribution pattern of the points on a scatter diagram, we can get an initial idea of how two variables are related, even for different groups of data.