# Observations

Observations are the set of collected data or values used for a specific analysis in statistics.

## What are observations?

Example:

If you want to make a statistical analysis of the students’ allowance in a school, each student’s allowance is an observation.

Say there are 25 students. A ranked list of each student’s monthly allowance is given by:

10, 15, 25, 40, 40, 50, 50, 60, 60, 60, 60, 75, 75, 80, 80, 100, 100, 110, 125, 150, 150, 150, 175, 200, 220

## Concepts of observations

Observation:
An observation is one of the values appearing in the analysis. In the example above each number is an observation.

Mode:
The value that appears most time in a set of observations. In the example above 60 is the mode, because it appears four times. If several values appear the same number of times and they all appear the most times, there are more than one mode.

Minimum:
The smallest value that appears. In the example above the minimum is 10.

Maximum:
The greatest value that appears. In the example above the maximum is 220.

Range:
The difference between maximum and minimum. In the example above the range is 220-10=210.

Mean:
The average of all the values. So it’s the sum of all the values divided by the number of values. In the example above the mean is , because the sum of the observations is 2260, and the number of observations is 25.

Frequency:
The number of times a value appears. In the example above the value 10 has a frequency of 1 and the value 60 has a frequency of 4.

Median:
If it’s a ranked set of data, having the smallest value first, the median of the data set splits off the first 50% of the observations from the other 50% observations. In the example above the median is the thirteenth observation, the value of which is 75, because that’s the observation being in the middle of the 25 observations. So the median tells us that 50% of the students are paid \$75 or less a month.

First quartile (or lower quartile)
If it’s a ranked set of data, having the smallest value first, the first quartile splits off the first 25% of the observations from the last 75%. In the example above the first quartile is positioned between the sixth and the seventh observation, because it is where the lower half of the data set is split into two halves. When the quartile is positioned between two observations, we need to calculate the average of the two observations. In this case the value of both the sixth and the seventh observation is 50, which gives the average value of 50. Thus the first quartile is 50. This tells us that 25% of the students are paid \$50 or less a month.

Third quartile (or upper quartile):
If it’s a ranked set of data, having the smallest value first, the third quartile splits off the last 25% of the observations from the first 75%. In the example above the third quartile is positioned between the nineteenth and the twentieth observation, because it is where the upper half of the data set is split into two halves. (Please note that when counting the observations of the upper half of the data set, the thirteenth observation, the median, is not included). Since the quartile is positioned between two observations, we need to calculate the average of the two observations. In this case the value of the nineteenth observation is 125, and the value of the twentieth observation is 150, the average of which is 137,5. Thus the third quartile is 137,5. This tells us that 75% of the students are paid \$137,5 or less a month.

## Class interval

Class intervals are used to organize data in groups, which enable you to get an idea of their distribution.

There are different ways to denote class intervals, one of which is using square brackets.

For example:

The interval ]0;50] is the set of numbers from 0 to 50, excluding 0 and including 50.

The interval [0;50[ is the set of numbers from 0 to 50, including 0 and excluding 50.

The observations from the allowance-example above can be organized in intervals of 50.

Example: 25 students’ allowance divided into intervals:

 Intervals ]0;50] ]50;100] ]100;150] ]150;200] ]200;250] Frequency 7 10 5 2 1 Percent frequency 28% 40% 20% 8% 4% Cumulative frequency 7 17 22 24 25 Cumulative percent frequency 28% 68% 88% 96% 100%

## Concepts of class intervals

Frequency:
The number of observations belonging to a given interval.

Percent frequency:
The relationship in percent between the number of observations belonging to a given interval and the total number of observations.
For example, there are 7 observations belonging to the interval ]0;50], out of a total number of 25 observations. That makes a percent frequency of observations in that interval given by:

Cumulative frequency:
Number of times an observation appears in a given interval added to frequencies of the previous intervals.
For example, the cumulative frequency of observations for the interval ]100;150] is given by: 7+10+5=22

Cumulative percent frequency
The cumulative frequency given as percentages of the total cumulative frequency.

Modal class interval:
The interval having the highest frequency of observations. In the table above ]50;100] is the modal class interval.
If more than one interval have the highest frequency there are more than one modal class interval.

Cumulative percent frequency graph:
When you want to visualize data arranged in class intervals, you would usually make a cumulative percent frequency graph like the one below. With such a graph it’s easy to show the first quartile, the median and the third quartile (the dotted lines).

Class interval arithmetic mean:
The class interval arithmetic mean is a kind of average value of all the observations calculated on the basis of each interval’s midpoint and their frequency of observations. It’s not as precise as if you find the mean of all the observations as a whole.

It’s easier to calculate the class interval arithmetic mean, if you put the data in a table, like the one bellow.

 Intervals ]0;50] ]50;100] ]100;150] ]150;200] ]200;250] Midpoints of the intervals 25 75 125 175 225 Frequency 7 10 5 2 1 Total value in each interval 7 x 25 10 x 75 5 x 125 2 x 175 1 x 225

First we calculate the sum of all intervals’ total values (in this case the total amount of the 25 students’ allowances).

7x25 + 10x75 + 5x125 + 2x175 + 1x225 = 175 + 750 + 625 + 350 + 225 = 2125

Then we calculate the class interval arithmetic mean (in this case the average amount of allowance per student) by dividing the total value by 25, since there are 25 students:

The class interval arithmetic mean is 85.