A+ » VCE » Further Maths U3 & 4 Master Notes » A1 Data Analysis » FM Association

FM Association

3.5 Residual Plots and Residual Analysis

Residual Plots

  • A residual is the name given to the difference in response variable value between a datapoint and the value predicted at the corresponding explanatory variable value by the fitted model.
  • Data points above the fitted line will have a positive residual value and those below the line will have a negative residual value.
  • A residual plot is similar to a scatterplot with each residual value placed at the same value along the x-axis as the corresponding datapoint, and along the y-axis at its residual value.
Read More »3.5 Residual Plots and Residual Analysis

3.4 Coefficient of Determination and Measures of Strength

Coefficient of Determination; r^2

  • The coefficient of determination gives a quantitative way of determining how much of the variation of the response variable is explained by variation in the explanatory variable.
  • It is represented by a lower-case r with a 2 superscript and can be calculated by squaring the correlation coefficient:

r^{2}=\left(\frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{(n-1) s_{x} s_{y}}\right)^{2}

  • When calculating the coefficient of determination, you will get a decimal answer, however, when interpreting the value, you should convert it into a percentage (multiply by 100).
Read More »3.4 Coefficient of Determination and Measures of Strength

2.8 Non-Causal Relationships

Observed Association

  • The associations we find by collecting data and analysing are known as observed associations, as this is what we see.
  • It is worth noting an observed association does not necessarily mean there is an actual relationship between the two variables in question, or that their relationship is causal (as we will explore further in this topic).
  • An observed association may be the result of:
    • An actual relationship of some form between the variables.
    • Chance
    • Poor experimental design

Common Response

Read More »2.8 Non-Causal Relationships

2.7 Cause and Effect

Correlation and Further Interpretation of the Correlation Coefficient

Note: if you cannot remember how to calculate and interpret the pearson’s correlation coefficient, revise 2.6 Pearson’s Correlation Coefficient.

  • Two variables which share a statistically meaningful association are said to be correlated. In Further Maths, “statistically meaningful” means they have a pearsons correlation coefficient which indicates an association (r \geq 0.25 or r \leq-0.25).
  • The strength of a correlation is the same as the strength (weak, moderate or strong) indicated by the pearsons correlation coefficient.
  • Correlation does not mean causation. Keep in mind that correlation is purely statistical and more information is needed to know the nature of the relationship between two variables (this concept is explored further in 2.8 Non-Causal Relationships).
Read More »2.7 Cause and Effect

2.6 Pearson’s Correlation Coefficient

Meaning and Calculation

  • Pearson’s correlation coefficient provides a quantitative method for determining the strength and direction of a numerical association.
  • It is denoted by a lower-case r and can be calculated using the following formula:

r=\frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{(n-1) s_{x} s_{y}}

Where s_{x} and s_{y} are the standard deviations of the explanatory and response variables, respectively.

Limitations of using Pearson’s Correlation Coefficient

Read More »2.6 Pearson’s Correlation Coefficient

2.5 Relationships between two Numerical Variables

Guidelines to Analysing Numerical Associations

  • Begin with context: what does the data represent?
  • Identify the explanatory and response variables.
  • Assess the form of the association: is it linear, non-linear or is there no association.
  • If it is linear, assess the strength (strong, moderate or weak). Ideally, do this using the Pearson’s correlation coefficient (detailed in 2.6 Pearson’s Correlation Coefficient), however if the raw data is not available, a qualitative assessment will suffice.
Read More »2.5 Relationships between two Numerical Variables

2.4 Scatterplots and Association between Numerical Variables

Scatterplots

  • Scatterplots are used to visualise data with explanatory and response variables.
  • They consist of an x-y axis with each datapoint represented as a dot above its x-value and to the right of its y-value.
  • Scatterplots can be used to see relationships between variables. These relationships can be described in terms of form, direction and strength.

Example

Picture 1

Form

Read More »2.4 Scatterplots and Association between Numerical Variables

2.3 Relationships between Numerical and Categorical Variables

Discussing Relationships between Numerical and Categorical Variables

  • Begin with context: what does the data represent?
  • Compare frequencies between the categories of the categorical dataset.
  • Compare the numerical data corresponding to each category on the basis of shape, spread, centre and presence of outliers.

Note: if you cannot remember how to choose appropriate measures for centre and spread, revise the notes for 1.6 Describing Numerical Distributions.

Read More »2.3 Relationships between Numerical and Categorical Variables

2.1 Response and Explanatory Variables

Explanatory Variable

  • The explanatory variable (EV) is the variable used to explain or predict another variable (the response variable).
  • By convention, the explanatory variable is plotted along the x-axis of a graph, if it is numerical.

Response Variable

  • The response variable (RV) is the variable which is explained or predicted by the explanatory variable.
  • By convention, the response variable is plotted along the y-axis of a graph, if it is numerical.

Note: both explanatory and response variables can be either categorical or numerical variables.

Read More »2.1 Response and Explanatory Variables