Home » FM Data Distributions

# FM Data Distributions

## 1.10 Further Statistical Concepts

### Population

• In statistics, a population is all people, objects or events defined by a set number of characteristics.
• When dealing with data representing an entire population, we use the following symbols for population parameters:

Mean: \mu (Greek symbol mu)

Standard Deviation: \sigma (Greek symbol sigma)

Examples

Examples of populations can include dogs in Melbourne, marbles in a sack or visits to a zoo.

### Sample

Read More »1.10 Further Statistical Concepts

## 1.9 The Normal Distribution

### Overview of the Normal Distribution

• The normal distribution appears often in population and natural distributions.
• It is often referred to as the bell curve.
• Normal distributions are assumed to be perfectly symmetric.

Note: this is not always the case in practice, but it is an accurate approximation.

• A key characteristic of the normal distribution is that the mean and median are equal and correspond to the highest frequency
Read More »1.9 The Normal Distribution

## 1.8 Statistical Analysis of Numerical Distributions

### Guide to Analysing a Numerical Distribution

• Begin with some context: what does the data represent?
• Always mention the minimum, centre and maximum.
• Check for outliers and mention if there are any.
• Describe the shape of the distribution.
• If there are outliers, mention the values of the lower and upper fences.
Read More »1.8 Statistical Analysis of Numerical Distributions

## 1.7 Box Plots and the Five Number Summary

### The Box Plot

• The box plot is a graphical tool used to analyse the shape, spread and outliers of a numerical distribution.
• It consists of a box with the bottom drawn at the value of quartile 1 and the top at quartile 3, a line drawn through the box at the median and a line either end of the box drawn to the lower and upper fences.
• If the median line is in the middle of the box, the distribution is approximately symmetric, if it is drawn closer to the bottom of the box, it is positively skewed, if it is drawn closer to the top of the box, it is negatively skewed.
• If the distribution has any outliers, they are represented as dots or crosses at their respective value along the y-axis and placed parallel to box.
Read More »1.7 Box Plots and the Five Number Summary

## 1.6 Describing Numerical Distributions

### Shape

• The shape of a numerical distribution relies on two factors: symmetry and outliers.
• If you can draw a vertical line through some point in the distribution whereby the distribution to the left of the line looks similar to a mirror image of the distribution to the right of it, it is an approximately symmetrical distribution. If this is not the case, the distribution is asymmetric.

Note: in some cases, you may find situations where the distribution has perfect symmetry. In these situations, you can drop the “approximately” term and refer to it simply as symmetrical.

Read More »1.6 Describing Numerical Distributions

## 1.5 Basic Statistical Concepts

### Mean

• The mean of a numerical distribution is found by summing up the values of all individual data points, then dividing by the number of data points.
• It is represented by either a capital letter with a bar drawn above it, or the Greek symbol mu (µ):

\bar{X}=\frac{\sum_{i=1}^{N} x_{i}}{N}

Where N is the total number of data points, and represents the i’th datapoint.

Note: the symbol \Sigma is short for “sum of”, so \sum_{i=1}^{N} x_{i} represents the sum of all individual data points (from datapoint 1, to datapoint N)

Read More »1.5 Basic Statistical Concepts

## 1.4 Displaying Numerical Data

Dot Plot Dot plots consist of a number line with each individual datapoint listed as a dot above it’s value. If multiple data points have… Read More »1.4 Displaying Numerical Data

## 1.3 Statistical Analysis of Categorical Distributions

Answering Statistical Questions on Categorical Distributions

### Mode

• The mode of categorical data refers to the category with the highest frequency.

Note: the mode of a categorical distribution is also known as the modal category, or dominant category

Example

Given the bar chart:

Red has the highest frequency and so it is the modal category.

### Guidelines to analysing categorical distributions

Read More »1.3 Statistical Analysis of Categorical Distributions

## 1.2 Displaying Distributions of Categorical Data

Visualising Categorical Data

### Frequency

• The number of times a particular value or category occurs is known as the frequency. This is often used as the basis for displaying and analysing categorical data.

Example

In the following dataset of colours:

Red Red Blue Red

The frequency of each colour is:

Red: 3

Blue: 1

### Percentage

• The proportion of the total data points which belong to a particular group is known as the percentage.
• This can be calculated using the formula:
Read More »1.2 Displaying Distributions of Categorical Data

## 1.1 Overview of Data Types

### Categorical Data

• Data which is sorted into groups is considered categorical data

Nominal Data

• Categorical data with no hierarchy (i.e. one category is not “greater than” another) is considered nominal data

Example

Eye colour can be considered a nominal data type as the data (each person’s eye colour) can be placed into groups and there is no hierarchy

Ordinal Data

Read More »1.1 Overview of Data Types