Correlation and Further Interpretation of the Correlation Coefficient
Note: if you cannot remember how to calculate and interpret the pearson’s correlation coefficient, revise 2.6 Pearson’s Correlation Coefficient.
- Two variables which share a statistically meaningful association are said to be correlated. In Further Maths, “statistically meaningful” means they have a pearsons correlation coefficient which indicates an association (r \geq 0.25 or r \leq-0.25).
- The strength of a correlation is the same as the strength (weak, moderate or strong) indicated by the pearsons correlation coefficient.
- Correlation does not mean causation. Keep in mind that correlation is purely statistical and more information is needed to know the nature of the relationship between two variables (this concept is explored further in 2.8 Non-Causal Relationships).
- In a causal relationship, the response variable is affected by the other. This could mean increasing the value of one variable consequently increases or decreases the other, or one event occurring leads to the other.
Example: the number of people using sunscreen and temperature have a causal relationship, as increasing temperatures will lead to more people using sunscreen.
- Observation refers to the process of collecting data without affecting the test group.
- This is sometimes the best approach, especially when attempting to determine how people, animals or events act in a “natural setting”.
- Conclusions made using observations are however often limited in their accuracy as there are often a number of factors which cannot be accounted for without interference and control groups.
Example: limitations of observations
Imagine you wanted to determine how mice react to predators so you set up a series of cameras and observe. You find that whenever a stray cat chases the mice, they hide in their nests. You conclude that mice must therefore react to predators by hiding in their nest.
However, is this a fair conclusion? Will they react the same when encountering a predator small enough to fit into their nest? Are they hiding or protecting their nest? What if they aren’t close to their nest? Experimentation is required to answer these questions.
- Experimentation refers to collecting data in a way which affects the test group.
- This is generally the best approach as it allows for more factors to be accounted for.
- Experimentation is always required to definitively determine cause and effect.
- Experimentation often utilises an unchanged or unaffected group known as a control group (e.g. in a pharmaceutical trial, the control group is the group which has not been given the test drug).
Example: the need for experimentation to determine cause and effect
It has been observed that when people begin using more sunscreen, more people tend to contract skin cancer. Your peer suggests, based on this observation, that sunscreen must cause skin cancer. In order to test this hypothesis, you setup an experiment during the summer: a number of test subjects are separated into two groups: one which uses sunscreen and a control group which does not. Both groups spend the same amount of time in the sun each day.
Once the results are collected, you find that the group using sunscreen were in fact less likely to contract skin cancer than the control group. You conclude that the hypothesis was incorrect and sunscreen does not cause cancer.
Note: this relationship is known as a common response relationship, with high temperatures/high UV rays being the common explanatory variable. See notes for 2.8 Non-Causal Relationships to learn more.
- Lurking variables are variables which affect the response variable but are not accounted or controlled for.
- Poorly designed experiments wont account for lurking variables.
- When interpreting data, keep in mind whether it is there are lurking variable that might affect the result.
- Weak correlation between two variables may indicate the presence of a lurking variable influencing the results.
An experiment is designed which seeks to find an association between a person’s diet and weight. No other factors are accounted for.
In this experiment there are several lurking variables, such as genetics, sex and age which all influence a person’s weight. Consequently, the results are unlikely to be accurate.