2.4 Scatterplots and Association between Numerical Variables


  • Scatterplots are used to visualise data with explanatory and response variables.
  • They consist of an x-y axis with each datapoint represented as a dot above its x-value and to the right of its y-value.
  • Scatterplots can be used to see relationships between variables. These relationships can be described in terms of form, direction and strength.


Picture 1


2.3 Relationships between Numerical and Categorical Variables

Discussing Relationships between Numerical and Categorical Variables

  • Begin with context: what does the data represent?
  • Compare frequencies between the categories of the categorical dataset.
  • Compare the numerical data corresponding to each category on the basis of shape, spread, centre and presence of outliers.

Note: if you cannot remember how to choose appropriate measures for centre and spread, revise the notes for 1.6 Describing Numerical Distributions.

1.9 The Normal Distribution

Overview of the Normal Distribution

  • The normal distribution appears often in population and natural distributions.
  • It is often referred to as the bell curve.
  • Normal distributions are assumed to be perfectly symmetric.

Note: this is not always the case in practice, but it is an accurate approximation.

  • A key characteristic of the normal distribution is that the mean and median are equal and correspond to the highest frequency
1.4 Displaying Numerical Data

Dot Plot

  • Dot plots consist of a number line with each individual datapoint listed as a dot above it’s value. If multiple data points have the same value, they are placed in a column.


Picture 2

Stem Plot

  • Stem plots are useful for displaying small to medium sized datasets.
  • The leading term for each value is referred to as a stem and is placed on the left side of a vertical line.
  • The following terms in each value are referred to as the leaf and are placed to the right of the line.
  • Multiple data points can share a common stem, but each leaf must represent only one datapoint.

Note: you may also see stem plots referred to as stem and leaf plots.

