Lesson Two -
Frequency Distributions and Graphs

1.  Frequency Distributions

Definitions:

Raw data - Data collected in their original form

Class - a specific grouping within the data

Class limits - the upper and lower values in a class, having the same decimal place value as the data (See page 25 for example)

Frequency (f) - the number of values in a specific class

Range ( R ) - highest data value minus the lowest data value in a data set.  Used to help determine the number of classes in a distribution.

Frequency distribution - A method of organizing data in some meaningful way (using classes and frequencies) to allow the researcher to describe the data, draw conclusions, or make inferences.

• Categorical - used for data that can be placed in specific categories, such as nominal or ordinal level data (See example 2-1, page 26)
• Ungrouped - used when a distribution has a small range and individual data values can be used instead of classes (see example 2-2, page 27-28)
• Grouped - used when the range is large and classes within the data set are needed (See example 2-3, page 31)

Class boundaries - the upper and lower values of a class for a grouped frequency distribution

Class width - the difference between the upper class boundary and the lower class boundary

Class midpoint (Xm) - (lower boundary + upper boundary)/2 (useful for graphing and when computing the mean and standard deviation of a data set)

Cumulative frequency - the sum of the frequencies accumulated up to the upper boundary of a class

Rules for creating a frequency distribution

A.  Use between 5 and 20 classes                         (SEE NOTE “A”)

B.  Class width should be an ODD number           (SEE NOTE “B”)

C.  Classes must be mutually exclusive             (SEE NOTE “C”)

D.  Classes must be continuous                           (SEE NOTE “D”)

E.  Classes must be exhaustive                            (SEE NOTE “E”)

F.  Classes must be equal in width (exception: open-ended classes)

(SEE NOTE “F”)

NOTES:

To start with, realize that these rules are not unlike the Pirates’ Code: they are more like guidelines.  Some are flexible, others are not…….

A. This is a guideline. What if you are doing the grades in a high school (Freshman, Sophomore, Junior, Senior)? Well, you are only going to have four groups then…..common sense has to take over here.

B. Can it be an even number? Does it HAVE to be ODD?  Sure, it can be even. But you will see later that having an odd number of groups can be very helpful.  Trust me on this for now.

C. Mutually Exclusive? What’s that?  All this means is that all data items in your total data set will go into JUST ONE GROUP AT A TIME.  No single data item can or ever will go into more than one group.  That’s all this means.  Now, someone will ask – could you have non-mutually exclusive conditions? Sure….but if you do, many of the statistical rules in this course will not apply.  So, you are to assume MUTUALLY EXCLUSIVE CONDITIONS in this course AT ALL TIMES (This effectively IS a rule, not a guideline).

D.  Continuous simply means that all groups are accounted for….imagine a high school in which there were no sophomores for some crazy reason.  Well, you can’t just ignore the “sophomore” group. Even if it is ZERO, the sophomore group must still be included in your distribution. (This, also, effectively IS a rule, not a guideline).  We will abide by this in all situations in the class.

E. Exhaustive is not meant to describe how tried you are. It is meant to say that all your data must be included in your data distribution.  You cannot just throw out data, just because you feel like it.  If the data is invalid for some reason (improperly measured, etc.), then, OK, you can throw it out, but VALID DATA MUST NEVER BE THROWN OUT.  Otherwise, why did you collect that data?  And why are you throwing it out? Are you trying to “doctor” the statistics?  That’s often called “misrepresentation”(I call it “lying”). This effectively IS also a rule, not a guideline.  We will abide by this in all situations in the class.

F. Classes are equal in width: here’s an example.  Suppose the first group is “5 -10”, then , the second group is “10-15”, then the third group is “15-20”, and so on.  In this case each class has a width of 5 (5 to 10 in “5” apart, as is 10 to 15, as well as 15-20).  The classes are equal in width.

Suppose you decided to maintain mutually exclusive conditions and made some changes. Let’s leave the 1st group alone, but change the 2nd group to “6-10”, and the third group to “11-15”.  Well, that’s ok, except that look at the widths…. The 1st one was still  5, but the 2nd group is now 4, and so is the 3rd group!  Arghhhhh!

Ok, no problem.  Make the 1st group “1-5”.  Great! But now we are violating rules “D” and “E” (the “Zero” is no longer included).  You can’t just throw data out!!!!!

You are now wondering how we obey rules “C” through “F” at the same time.  Well, we can do it with the original scheme:

Group 1:  5-10, Group 2:  10-15,  Group 3:  15-20, etc.

But WAIT!!!!! How do we maintain mutually exclusive conditions?  The number “10” could go in either the first group, or the second group (“15” has a similar problem).  That violates rule “C”, right?  -----Well, not if we adopt a convention (a rule that everyone agrees to live by…). The convention is what I call the “bump-up” rule. When a number could go into either group, we all agree to put that number in the HIGHER group, not the lower group. So the group “5-10” will really have all the numbers from “5.0” to “9.999”, but it will not include the number “10”. By adopting this “bump-up rule” we can obey rule “C”.

In this class, we will bump up by CONVENTION.

Constructing a grouped frequency distribution

1.  Find the lowest and highest data values

2.  Find the range (R = highest value - lowest value)

3.  Select number of classes desired (between 5 and 20)

4.  Find the class width (Range/number of classes, then round UP)

5.  Select a starting point (usually lowest data value), add class width to get lower limits

6.  Find the upper class limits

7.  Tally the data, find frequencies and cumulative frequency

Why do we construct frequency distributions?

1.  To organize the data in a meaningful way

2.  To determine the nature or shape of the distribution

3.  To compute measures of central tendency, variation, and position

4.  To make comparisons among different data sets

5.  To draw charts and graphs for data presentation

Statistical Charts - A way of presenting data in a study so that it can be better understood by those who benefit from reading the study.

Three Major Types of Graphs:

A.  Histogram - Displays data by using vertical bars of varying heights to represent the frequencies.

B.  Frequency Polygon - Displays data by using lines that connect points plotted for the frequencies at the midpoints of the classes.

C.  Ogive  - (pronouced "O-jive") Displays the cumulative frequencies for classes in a frequency distribution

Constructing the Histogram

a.  Draw and label the x and y axes

b.  Represent the class boundaries on the x-axis and frequency on the y-axis

c.  Draw in the bars

Constructing the Frequency Polygon

a.  Find the midpoints of each class in the frequency distribution

b.  Draw and label the x and y axes

c.  Represent the class midpoints on the x-axis and frequency on the y-axis

d.  Plot the frequency values for each class midpoint

e.  Connect the dots, adding the lines connecting to where the first and last midpoints would have been located.

Constructing the Ogive (Cumulative Frequency Graph)

a.  Find the cumulative frequency for each class

b.  Draw and label the x and y axes

c.  Represent the class boundaries on the x-axis and frequency on the y-axis

d.  Plot the cumulative frequency values for each upper class boundary

Other Types of Graphs

A.  Pareto chart - Used to represent a frequency distribution for a categorical variable

B.  Time series graph - Displays data that occur over a specific period of time

C.  Pie chart - displays data as a circle divided into wedges according to the percentage of frequencies in each category of the distribution

Constructing the Pareto Chart

a.  Arrange the data from largest to smallest according to frequency

b.  Draw and label the x and y axes

c.  Represent the categories on the x-axis and frequency on the y-axis

c.  Draw the bars according to the frequencies (make bars the same width)

Constructing the Time series graph

a.  Draw and label the x and y axes

b.  Represent the time units on the x-axis and data described on the y-axis

c.  Plot each point

c.  Draw a line connecting the points

Constructing the Pie chart

a.  Convert each class frequency into a proportional part of the circle (degrees=f/n*360)

b.  Convert each class frequency into a percentage

c.  Draw and label each section of the circle (use a protractor and a compass, or a spreadsheet program will do this for you)