A+ » VCE » Further Maths U3 & 4 Master Notes » A1 Data Analysis » FM Linear Association

# FM Linear Association

## 3.7 Least Squares Regression for Transformed Data

Note: if you cannot remember how to interpret least squares regression lines, revise notes for 3.3 Using the Formula for a Fitted Line.

### Guideline to Analyse Least Squares Linear Regression Relationships for Transformed Data

• Analysing a least squares linear fit for transformed data is similar to the process for non-transformed data, however you must keep in mind the association is not between the explanatory variable and response variable, it is between the transformed variable and the non-transformed variable (which will be either the explanatory or response variable).
• When interpreting the meaning of the coefficient of determination, it gives an indication of what percentage of variation in the transformed variable is explained by variation of the non-transformed variable, or visa-versa (e.g. for an explanatory variable squared transformation, the coefficient indicates what percentage of variation in y can be explained by variation in x^2).
Read More »3.7 Least Squares Regression for Transformed Data

## 3.6 Introduction to Data Transformations

### Linearization

• So far we have only looked at methods of analysing linear associations and not non-linear associations. Luckily, linearization provides a convenient way of transforming non-linear associations into linear ones so that they can be analysed using the same methods.
• Linearization works by applying a transformation of some form to either the explanatory and/or response variables datasets. In Further Maths, you will only deal with situations requiring one of the datasets to be transformed at a time.
• Keep in mind that the formula for the linearised model must include the transformation (e.g. the formula for a model which has undergone a square transformation to the explanatory variable will be of the form y=a+bx^2).

### Square Transformation

Read More »3.6 Introduction to Data Transformations

## 3.5 Residual Plots and Residual Analysis

### Residual Plots

• A residual is the name given to the difference in response variable value between a datapoint and the value predicted at the corresponding explanatory variable value by the fitted model.
• Data points above the fitted line will have a positive residual value and those below the line will have a negative residual value.
• A residual plot is similar to a scatterplot with each residual value placed at the same value along the x-axis as the corresponding datapoint, and along the y-axis at its residual value.
Read More »3.5 Residual Plots and Residual Analysis

## 3.4 Coefficient of Determination and Measures of Strength

### Coefficient of Determination; r^2

• The coefficient of determination gives a quantitative way of determining how much of the variation of the response variable is explained by variation in the explanatory variable.
• It is represented by a lower-case r with a 2 superscript and can be calculated by squaring the correlation coefficient:

r^{2}=\left(\frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{(n-1) s_{x} s_{y}}\right)^{2}

• When calculating the coefficient of determination, you will get a decimal answer, however, when interpreting the value, you should convert it into a percentage (multiply by 100).
Read More »3.4 Coefficient of Determination and Measures of Strength

## 3.3 Using the Formula for a Fitted Line

### Interpolation

• After fitting a model to a dataset (through linear regression), we can use that model to estimate values we don’t have data points for.
• When estimating values that lie within the range of available raw data points, we refer to it as interpolating.
• Interpolation is considered accurate if the fit has high strength and sufficient data points were used.

Example: if a linear fit is creating using data points ranging in value from 1 to 10, estimating the value of the response variable when the explanatory variable has a value of 2 would be considered interpolation.

### Extrapolation

Read More »3.3 Using the Formula for a Fitted Line

## 3.2 Modelling Linear Associations

### Identifying Explanatory and Response Variables

• It is important to correctly select the explanatory and response variables when using regression, or the relationship will be incorrect.
• The explanatory variable is the variable which is used to explain or predict the response variable.
• In a conventional x-y dataset, the x variable is the explanatory variable and y is the response variable.

### Fitting Least Squares Models

• Start by identifying the explanatory and response variables.
Read More »3.2 Modelling Linear Associations

## 3.1 Least Squares Linear Regression

### The Idea behind Least Squares Regression

• In order to conveniently estimate the expected values of one variable based on another, we often create a mathematical model which fits, as closely as possible, the data we have collected. In Further Maths, we will only deal with linear regression, where we try to come up with a straight line that fits our data.
• In least squares regression, we try to find that “best fit” by finding a line that minimises the value of the sum of squared residuals (i.e. we take the difference between each datapoint and the line, then square each and add them all together).
• The resulting line is of the form

y=a+bx

where y and x are the response and explanatory variables, respectively, and a and b are constants which must be determined.

• Least squares linear regression is only appropriate if:
Read More »3.1 Least Squares Linear Regression

## 2.5 Relationships between two Numerical Variables

### Guidelines to Analysing Numerical Associations

• Begin with context: what does the data represent?
• Identify the explanatory and response variables.
• Assess the form of the association: is it linear, non-linear or is there no association.
• If it is linear, assess the strength (strong, moderate or weak). Ideally, do this using the Pearson’s correlation coefficient (detailed in 2.6 Pearson’s Correlation Coefficient), however if the raw data is not available, a qualitative assessment will suffice.
Read More »2.5 Relationships between two Numerical Variables