Home » FM Data Distributions

FM Data Distributions

1.10 Further Statistical Concepts

Population

  • In statistics, a population is all people, objects or events defined by a set number of characteristics.
  • When dealing with data representing an entire population, we use the following symbols for population parameters:

Mean: \mu (Greek symbol mu)

Standard Deviation: \sigma (Greek symbol sigma)

Examples

Examples of populations can include dogs in Melbourne, marbles in a sack or visits to a zoo.

Sample

Read More »1.10 Further Statistical Concepts

1.9 The Normal Distribution

Overview of the Normal Distribution

  • The normal distribution appears often in population and natural distributions.
  • It is often referred to as the bell curve.
  • Normal distributions are assumed to be perfectly symmetric.

Note: this is not always the case in practice, but it is an accurate approximation.

  • A key characteristic of the normal distribution is that the mean and median are equal and correspond to the highest frequency
Read More »1.9 The Normal Distribution

1.7 Box Plots and the Five Number Summary

The Box Plot

  • The box plot is a graphical tool used to analyse the shape, spread and outliers of a numerical distribution.
  • It consists of a box with the bottom drawn at the value of quartile 1 and the top at quartile 3, a line drawn through the box at the median and a line either end of the box drawn to the lower and upper fences.
  • If the median line is in the middle of the box, the distribution is approximately symmetric, if it is drawn closer to the bottom of the box, it is positively skewed, if it is drawn closer to the top of the box, it is negatively skewed.
  • If the distribution has any outliers, they are represented as dots or crosses at their respective value along the y-axis and placed parallel to box.
Read More »1.7 Box Plots and the Five Number Summary

1.6 Describing Numerical Distributions

Shape

  • The shape of a numerical distribution relies on two factors: symmetry and outliers.
  • If you can draw a vertical line through some point in the distribution whereby the distribution to the left of the line looks similar to a mirror image of the distribution to the right of it, it is an approximately symmetrical distribution. If this is not the case, the distribution is asymmetric.

Note: in some cases, you may find situations where the distribution has perfect symmetry. In these situations, you can drop the “approximately” term and refer to it simply as symmetrical.

Read More »1.6 Describing Numerical Distributions

1.5 Basic Statistical Concepts

Mean

  • The mean of a numerical distribution is found by summing up the values of all individual data points, then dividing by the number of data points.
  • It is represented by either a capital letter with a bar drawn above it, or the Greek symbol mu (µ):

\bar{X}=\frac{\sum_{i=1}^{N} x_{i}}{N}

Where N is the total number of data points, and represents the i’th datapoint.

Note: the symbol \Sigma is short for “sum of”, so \sum_{i=1}^{N} x_{i} represents the sum of all individual data points (from datapoint 1, to datapoint N)

Read More »1.5 Basic Statistical Concepts

1.3 Statistical Analysis of Categorical Distributions

Answering Statistical Questions on Categorical Distributions

Mode

  • The mode of categorical data refers to the category with the highest frequency.

Note: the mode of a categorical distribution is also known as the modal category, or dominant category

Example

Given the bar chart:

Bar Chart for Categorical Data

Red has the highest frequency and so it is the modal category.

Guidelines to analysing categorical distributions

Read More »1.3 Statistical Analysis of Categorical Distributions

1.2 Displaying Distributions of Categorical Data

Visualising Categorical Data

Frequency

  • The number of times a particular value or category occurs is known as the frequency. This is often used as the basis for displaying and analysing categorical data.

Example

In the following dataset of colours:

Red Red Blue Red

The frequency of each colour is:

Red: 3

Blue: 1

Percentage

  • The proportion of the total data points which belong to a particular group is known as the percentage.
  • This can be calculated using the formula:
Read More »1.2 Displaying Distributions of Categorical Data

1.1 Overview of Data Types

Categorical Data

  • Data which is sorted into groups is considered categorical data

Nominal Data

      • Categorical data with no hierarchy (i.e. one category is not “greater than” another) is considered nominal data

Example

Eye colour can be considered a nominal data type as the data (each person’s eye colour) can be placed into groups and there is no hierarchy

Ordinal Data

Read More »1.1 Overview of Data Types