A+ » VCE » Further Maths U3 & 4 Master Notes » A1 Data Analysis » FM Data Analysis

# FM Data Analysis

## 4.5 Analysis of De-seasonalised Data

### Linear Regression of Ordinary Time Series Data

• As with any other type of bivariate data, it is often useful to apply linear regression to time series data in order to predict values for which we have no data.
• For time series data, time is always the explanatory variable.
• Unprocessed data with seasonality is generally poorly modelled by a linear fit.

Note: if you cannot remember how to construct and interpret a linear fit, revise notes for 3.1 Least Squares Linear Regression and 3.2 Modelling Linear Associations.

### Re-seasonalising Data

Read More »4.5 Analysis of De-seasonalised Data

## 4.4 Introduction to Seasonal Indices

### Seasonal Indices

• Seasonal indices provide a method to de-seasonalise data.
• The seasonal index of a season/month/period/etc. compares the average value of a particular season to the average of all seasons in a cycle.
• A seasonal index of 1 indicates the average value of the season is exactly equal to the average value of the entire cycle.
• A seasonal index greater than 1 indicates the average value of the season is greater than that of the entire cycle (e.g. a seasonal index of 1.2 indicates the season’s average is 20% higher than the cycle’s average).
Read More »4.4 Introduction to Seasonal Indices

## 4.3 Numerical Smoothing using the Moving Median Method

Note: if you can’t remember the basics of numerical smoothing, revise notes for 4.2 Numerical Smoothing using the Moving Mean Method.

### Moving Median Smoothing

• The moving median smoothing method involves taking the median of each group.
• This method is particularly effective when the exact values of data points are unknown (e.g. if the data is shown in a time series plot without the raw dataset).

### Smooth a Single Point using an Odd Moving Median

Read More »4.3 Numerical Smoothing using the Moving Median Method

## 4.2 Numerical Smoothing using the Moving Mean Method

### The Idea of Numerical Smoothing

• Time series plots are often ‘noisy’, with many random fluctuations that make it difficult to analyse the long-term pattern of the data.
• Numerical smoothing provides a method of lessening the impact of those random fluctuations so that the pattern is easier to discern.
• The two methods for numerical smoothing used in further maths are moving mean and moving median smoothing.
• In both methods, a new set of values are created by taking a group of data points, finding the mean or median, then moving to the next group (by replacing the first data point in the group with the next data point not yet included).
Read More »4.2 Numerical Smoothing using the Moving Mean Method

## 4.1 Introduction to Time Series Plots

### Time Series Plots

• Time series plots are a specific type of graph, where the explanatory variable is time.
• They are used to analyse how a system changes over time.
• When describing a time series plot, the trend, seasonality, irregular fluctuations, structural change and outliers are all important aspects.
• Time series plots can either be shown as a graph with data points connected by lines, or as a dot plot.
Read More »4.1 Introduction to Time Series Plots

## 3.7 Least Squares Regression for Transformed Data

Note: if you cannot remember how to interpret least squares regression lines, revise notes for 3.3 Using the Formula for a Fitted Line.

### Guideline to Analyse Least Squares Linear Regression Relationships for Transformed Data

• Analysing a least squares linear fit for transformed data is similar to the process for non-transformed data, however you must keep in mind the association is not between the explanatory variable and response variable, it is between the transformed variable and the non-transformed variable (which will be either the explanatory or response variable).
• When interpreting the meaning of the coefficient of determination, it gives an indication of what percentage of variation in the transformed variable is explained by variation of the non-transformed variable, or visa-versa (e.g. for an explanatory variable squared transformation, the coefficient indicates what percentage of variation in y can be explained by variation in x^2).
Read More »3.7 Least Squares Regression for Transformed Data

## 3.6 Introduction to Data Transformations

### Linearization

• So far we have only looked at methods of analysing linear associations and not non-linear associations. Luckily, linearization provides a convenient way of transforming non-linear associations into linear ones so that they can be analysed using the same methods.
• Linearization works by applying a transformation of some form to either the explanatory and/or response variables datasets. In Further Maths, you will only deal with situations requiring one of the datasets to be transformed at a time.
• Keep in mind that the formula for the linearised model must include the transformation (e.g. the formula for a model which has undergone a square transformation to the explanatory variable will be of the form y=a+bx^2).

### Square Transformation

Read More »3.6 Introduction to Data Transformations

## 3.5 Residual Plots and Residual Analysis

### Residual Plots

• A residual is the name given to the difference in response variable value between a datapoint and the value predicted at the corresponding explanatory variable value by the fitted model.
• Data points above the fitted line will have a positive residual value and those below the line will have a negative residual value.
• A residual plot is similar to a scatterplot with each residual value placed at the same value along the x-axis as the corresponding datapoint, and along the y-axis at its residual value.
Read More »3.5 Residual Plots and Residual Analysis

## 3.4 Coefficient of Determination and Measures of Strength

### Coefficient of Determination; r^2

• The coefficient of determination gives a quantitative way of determining how much of the variation of the response variable is explained by variation in the explanatory variable.
• It is represented by a lower-case r with a 2 superscript and can be calculated by squaring the correlation coefficient:

r^{2}=\left(\frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{(n-1) s_{x} s_{y}}\right)^{2}

• When calculating the coefficient of determination, you will get a decimal answer, however, when interpreting the value, you should convert it into a percentage (multiply by 100).
Read More »3.4 Coefficient of Determination and Measures of Strength

## 3.3 Using the Formula for a Fitted Line

### Interpolation

• After fitting a model to a dataset (through linear regression), we can use that model to estimate values we don’t have data points for.
• When estimating values that lie within the range of available raw data points, we refer to it as interpolating.
• Interpolation is considered accurate if the fit has high strength and sufficient data points were used.

Example: if a linear fit is creating using data points ranging in value from 1 to 10, estimating the value of the response variable when the explanatory variable has a value of 2 would be considered interpolation.

### Extrapolation

Read More »3.3 Using the Formula for a Fitted Line