A+ » VCE » Further Maths U3 & 4 Master Notes » A1 Data Analysis » FM Linear Association

FM Linear Association

3.7 Least Squares Regression for Transformed Data

Note: if you cannot remember how to interpret least squares regression lines, revise notes for 3.3 Using the Formula for a Fitted Line.

Guideline to Analyse Least Squares Linear Regression Relationships for Transformed Data

  • Analysing a least squares linear fit for transformed data is similar to the process for non-transformed data, however you must keep in mind the association is not between the explanatory variable and response variable, it is between the transformed variable and the non-transformed variable (which will be either the explanatory or response variable).
  • When interpreting the meaning of the coefficient of determination, it gives an indication of what percentage of variation in the transformed variable is explained by variation of the non-transformed variable, or visa-versa (e.g. for an explanatory variable squared transformation, the coefficient indicates what percentage of variation in y can be explained by variation in x^2).
Read More »3.7 Least Squares Regression for Transformed Data

3.6 Introduction to Data Transformations

Linearization

  • So far we have only looked at methods of analysing linear associations and not non-linear associations. Luckily, linearization provides a convenient way of transforming non-linear associations into linear ones so that they can be analysed using the same methods.
  • Linearization works by applying a transformation of some form to either the explanatory and/or response variables datasets. In Further Maths, you will only deal with situations requiring one of the datasets to be transformed at a time.
  • Keep in mind that the formula for the linearised model must include the transformation (e.g. the formula for a model which has undergone a square transformation to the explanatory variable will be of the form y=a+bx^2).

Square Transformation

Read More »3.6 Introduction to Data Transformations

3.5 Residual Plots and Residual Analysis

Residual Plots

  • A residual is the name given to the difference in response variable value between a datapoint and the value predicted at the corresponding explanatory variable value by the fitted model.
  • Data points above the fitted line will have a positive residual value and those below the line will have a negative residual value.
  • A residual plot is similar to a scatterplot with each residual value placed at the same value along the x-axis as the corresponding datapoint, and along the y-axis at its residual value.
Read More »3.5 Residual Plots and Residual Analysis

3.4 Coefficient of Determination and Measures of Strength

Coefficient of Determination; r^2

  • The coefficient of determination gives a quantitative way of determining how much of the variation of the response variable is explained by variation in the explanatory variable.
  • It is represented by a lower-case r with a 2 superscript and can be calculated by squaring the correlation coefficient:

r^{2}=\left(\frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{(n-1) s_{x} s_{y}}\right)^{2}

  • When calculating the coefficient of determination, you will get a decimal answer, however, when interpreting the value, you should convert it into a percentage (multiply by 100).
Read More »3.4 Coefficient of Determination and Measures of Strength

3.3 Using the Formula for a Fitted Line

Interpolation

  • After fitting a model to a dataset (through linear regression), we can use that model to estimate values we don’t have data points for.
  • When estimating values that lie within the range of available raw data points, we refer to it as interpolating.
  • Interpolation is considered accurate if the fit has high strength and sufficient data points were used.

Example: if a linear fit is creating using data points ranging in value from 1 to 10, estimating the value of the response variable when the explanatory variable has a value of 2 would be considered interpolation.

Extrapolation

Read More »3.3 Using the Formula for a Fitted Line

3.2 Modelling Linear Associations

Identifying Explanatory and Response Variables

  • It is important to correctly select the explanatory and response variables when using regression, or the relationship will be incorrect.
  • The explanatory variable is the variable which is used to explain or predict the response variable.
  • In a conventional x-y dataset, the x variable is the explanatory variable and y is the response variable.

Fitting Least Squares Models

  • Start by identifying the explanatory and response variables.
Read More »3.2 Modelling Linear Associations

3.1 Least Squares Linear Regression

The Idea behind Least Squares Regression

  • In order to conveniently estimate the expected values of one variable based on another, we often create a mathematical model which fits, as closely as possible, the data we have collected. In Further Maths, we will only deal with linear regression, where we try to come up with a straight line that fits our data.
  • In least squares regression, we try to find that “best fit” by finding a line that minimises the value of the sum of squared residuals (i.e. we take the difference between each datapoint and the line, then square each and add them all together).
  • The resulting line is of the form

y=a+bx

where y and x are the response and explanatory variables, respectively, and a and b are constants which must be determined.

  • Least squares linear regression is only appropriate if:
Read More »3.1 Least Squares Linear Regression

2.5 Relationships between two Numerical Variables

Guidelines to Analysing Numerical Associations

  • Begin with context: what does the data represent?
  • Identify the explanatory and response variables.
  • Assess the form of the association: is it linear, non-linear or is there no association.
  • If it is linear, assess the strength (strong, moderate or weak). Ideally, do this using the Pearson’s correlation coefficient (detailed in 2.6 Pearson’s Correlation Coefficient), however if the raw data is not available, a qualitative assessment will suffice.
Read More »2.5 Relationships between two Numerical Variables