Lesson Two -
Frequency
Distributions and Graphs
1. Frequency
Distributions
Definitions:
Raw data - Data collected in their original form
Class - a specific grouping within the data
Class limits - the upper and lower values in a class, having the
same decimal place value as the data (See page 25 for example)
Frequency (f) - the number of values in a specific class
Range ( R ) - highest data value minus the lowest data value in
a data set. Used to help determine the number of
classes in a distribution.
Frequency
distribution - A method of
organizing data in some meaningful way (using classes and frequencies) to allow
the researcher to describe the data, draw conclusions, or make inferences.
Class boundaries - the upper and lower values of a class for a
grouped frequency distribution
Class width - the difference between the upper class boundary
and the lower class boundary
Class midpoint (Xm) - (lower boundary + upper boundary)/2 (useful for
graphing and when computing the mean and standard deviation of a data set)
Cumulative frequency - the sum of the frequencies accumulated up to the
upper boundary of a class
Rules for creating a
frequency distribution
A. Use between 5 and 20 classes (SEE NOTE “A”)
B. Class width should be an ODD
number (SEE
NOTE “B”)
C. Classes must be mutually
exclusive (SEE
NOTE “C”)
D. Classes must be continuous
(SEE
NOTE “D”)
E. Classes must be exhaustive
(SEE
NOTE “E”)
F. Classes must be equal
in width (exception: open-ended classes)
(SEE
NOTE “F”)
NOTES:
To start with, realize that these rules are
not unlike the Pirates’ Code: they are more like guidelines. Some are flexible, others are not…….
A. This is a guideline. What if you are
doing the grades in a high school (Freshman, Sophomore,
Junior, Senior)? Well, you are only going to have four groups then…..common sense
has to take over here.
B. Can it be an even number? Does it HAVE to
be ODD?
Sure, it can be even. But you will see later
that having an odd number of groups can be very helpful. Trust me on this for now.
C. Mutually Exclusive? What’s that? All this means is that
all data items in your total data set will go into JUST ONE GROUP AT A TIME. No single data item can or ever will go into more
than one group. That’s all this means. Now,
someone will ask – could you have non-mutually exclusive conditions? Sure….but
if you do, many of the statistical rules in this course will not apply. So, you are to assume MUTUALLY EXCLUSIVE
CONDITIONS in this course AT ALL TIMES (This effectively IS a rule,
not a guideline).
D. Continuous simply means that all groups are
accounted for….imagine a high school in which there were
no sophomores for some crazy reason.
Well, you can’t just ignore the “sophomore” group. Even if it is ZERO,
the sophomore group must still be included in your distribution. (This, also, effectively
IS a rule, not a guideline). We will abide
by this in all situations in the class.
E. Exhaustive is not meant to describe how
tried you are. It is meant to say that all your data must be included in your
data distribution. You cannot just throw
out data, just because you feel like it.
If the data is invalid for some reason (improperly measured, etc.),
then, OK, you can throw it out, but VALID
DATA MUST NEVER BE THROWN OUT.
Otherwise, why did you collect that data? And why are you throwing it out? Are you
trying to “doctor” the statistics? That’s
often called “misrepresentation”(I call it “lying”). This
effectively IS also a rule, not a guideline.
We will abide by this in all situations in the class.
F. Classes are equal in width: here’s an example. Suppose the first group is “5 -10”, then ,
the second group is “10-15”, then the third group is “15-20”, and so on. In this case each class has a width of 5 (5
to 10 in “5” apart, as is 10 to 15, as well as 15-20). The classes are equal in width.
Suppose you decided to maintain mutually
exclusive conditions and made some changes. Let’s leave the 1st group
alone, but change the 2nd group to “6-10”, and the third group to “11-15”. Well, that’s ok, except that look at the
widths…. The 1st one was still 5, but the 2nd group is now
4, and so is the 3rd group! Arghhhhh!
Ok, no problem. Make the 1st group “1-5”. Great! But now we are violating rules “D” and
“E” (the “Zero” is no longer included). You
can’t just throw data out!!!!!
You are
now wondering how we obey rules “C” through “F” at the same time. Well, we can do it with the original scheme:
Group
1: 5-10, Group 2: 10-15, Group 3: 15-20, etc.
But
WAIT!!!!! How do we maintain mutually exclusive conditions? The number “10” could go in either the first
group, or the second group (“15” has a similar problem). That violates rule “C”, right? -----Well, not if we adopt a convention (a
rule that everyone agrees to live by…). The convention is what I call the “bump-up”
rule. When a number could go into either group, we all agree to put that number
in the HIGHER group, not the lower group. So the group “5-10” will really have
all the numbers from “5.0” to “9.999”, but it will not include the number “10”.
By adopting this “bump-up rule” we can obey rule “C”.
In this
class, we will bump up by CONVENTION.
Constructing a
grouped frequency distribution
1. Find the lowest and highest data
values
2. Find the range (R = highest value -
lowest value)
3. Select number of classes desired
(between 5 and 20)
4. Find the class width (Range/number
of classes, then round UP)
5. Select a starting point (usually
lowest data value), add class width to get lower
limits
6. Find the upper class limits
7. Tally the data, find frequencies
and cumulative frequency
Why do we
construct frequency distributions?
1. To organize the data in a
meaningful way
2. To determine the nature or shape of
the distribution
3. To compute measures of central
tendency, variation, and position
4. To make comparisons among different
data sets
5. To draw charts and graphs for data
presentation
Statistical Charts - A way of presenting data in a study so that it can
be better understood by those who benefit from reading the study.
Three Major Types of Graphs:
A. Histogram - Displays data by using vertical bars of varying
heights to represent the frequencies.
B. Frequency Polygon - Displays data by using lines that connect points
plotted for the frequencies at the midpoints of the classes.
C. Ogive - (pronouced "O-jive") Displays the cumulative
frequencies for classes in a frequency distribution
Constructing the Histogram
a. Draw and
label the x and y axes
b. Represent
the class boundaries on the x-axis and frequency on the y-axis
c. Draw in
the bars
Constructing the Frequency Polygon
a. Find the
midpoints of each class in the frequency distribution
b. Draw and
label the x and y axes
c. Represent
the class midpoints on the x-axis and frequency on the y-axis
d. Plot the
frequency values for each class midpoint
e. Connect
the dots, adding the lines connecting to where the first and last midpoints
would have been located.
Constructing the Ogive
(Cumulative Frequency Graph)
a. Find the
cumulative frequency for each class
b. Draw and
label the x and y axes
c. Represent
the class boundaries on the x-axis and frequency on the y-axis
d. Plot the
cumulative frequency values for each upper class boundary
Other Types of
Graphs
A. Pareto chart - Used
to represent a frequency distribution for a categorical variable
B. Time series graph -
Displays data that occur over a specific period of time
C. Pie chart - displays
data as a circle divided into wedges according to the percentage of frequencies
in each category of the distribution
Constructing the Pareto Chart
a. Arrange
the data from largest to smallest according to frequency
b. Draw and
label the x and y axes
c. Represent
the categories on the x-axis and frequency on the y-axis
c. Draw the
bars according to the frequencies (make bars the same width)
Constructing the Time series graph
a. Draw and
label the x and y axes
b. Represent
the time units on the x-axis and data described on the y-axis
c. Plot each
point
c. Draw a
line connecting the points
Constructing the Pie chart
a. Convert
each class frequency into a proportional part of the circle (degrees=f/n*360)
b. Convert
each class frequency into a percentage
c. Draw and
label each section of the circle (use a protractor and a compass, or a
spreadsheet program will do this for you)
Misleading Graphs
A. Inappropriately truncating the axis
of a graph (Figs, 2-14, 15, and 16)
B. Displaying a one-dimensional change
in two dimensions (Fig 2-17)
|
HOMEWORK: Read Chapters 1 and 2
in the textbook. |