2.4 Scatterplots and Association between Numerical Variables

Scatterplots

  • Scatterplots are used to visualise data with explanatory and response variables.
  • They consist of an x-y axis with each datapoint represented as a dot above its x-value and to the right of its y-value.
  • Scatterplots can be used to see relationships between variables. These relationships can be described in terms of form, direction and strength.

Example

Picture 1

Form

Read More »2.4 Scatterplots and Association between Numerical Variables

2.3 Relationships between Numerical and Categorical Variables

Discussing Relationships between Numerical and Categorical Variables

  • Begin with context: what does the data represent?
  • Compare frequencies between the categories of the categorical dataset.
  • Compare the numerical data corresponding to each category on the basis of shape, spread, centre and presence of outliers.

Note: if you cannot remember how to choose appropriate measures for centre and spread, revise the notes for 1.6 Describing Numerical Distributions.

Read More »2.3 Relationships between Numerical and Categorical Variables

1.9 The Normal Distribution

Overview of the Normal Distribution

  • The normal distribution appears often in population and natural distributions.
  • It is often referred to as the bell curve.
  • Normal distributions are assumed to be perfectly symmetric.

Note: this is not always the case in practice, but it is an accurate approximation.

  • A key characteristic of the normal distribution is that the mean and median are equal and correspond to the highest frequency
Read More »1.9 The Normal Distribution

1.4 Displaying Numerical Data

Dot Plot

  • Dot plots consist of a number line with each individual datapoint listed as a dot above it’s value. If multiple data points have the same value, they are placed in a column.

Example

Picture 2

Stem Plot

  • Stem plots are useful for displaying small to medium sized datasets.
  • The leading term for each value is referred to as a stem and is placed on the left side of a vertical line.
  • The following terms in each value are referred to as the leaf and are placed to the right of the line.
  • Multiple data points can share a common stem, but each leaf must represent only one datapoint.

Note: you may also see stem plots referred to as stem and leaf plots.

Read More »1.4 Displaying Numerical Data