Lesson 1:

The Nature and Probability of Statistics

What is statistics?

The science of conducting studies to collect, organize, summarize, analyze and draw conclusions from data.

Two types of statistics:

1. Descriptive - collection, organization, summation & presentation of data

2. Inferential - generalizing, testing, determining relationships, making predictions, all from samples to populations - using probability theory

Some definitions:

 Variables –mathematical symbols that represent a number, but have no fixed value Data - Value(s) that variables can assume (numbers, types, colors, etc.) Data set - A collection of data values Random Variables - Variables whose values are determined by chance Population - all data values being studied Sample - A subgroup of a given population

Questions

1. Give some examples of variables

a: Qualitative - can be placed into distinct categories: gender, color, religion

b: Quantitative - numerical, can be ordered or ranked: age, temperature, height

c: Discrete - assume values that can be counted (number of kids, days in a cycle)

d: Continuous - can assume all values between any two specific values (temp)

2. Boundaries - useful when considering continuous variables, allows us to more easily group them

Measurement Scales

1. Nominal: Classifies data into mutually exclusive categories in which NO order or ranking can be imposed on the data

• University departments
• Gender
• Hair color
• Marital status
• Race
• Political party
• Religion
• Zip codes!
• Area codes
2. Ordinal: Classifies data into categories that CAN be ranked; however, precise differences between the ranks do not exist
• Rating Scales (poor, average, good, excellent)
• Rankings (1st, 2nd...)
3. Interval: Same as above, except precise differences between the ranks DO exist; however, there is no meaningful zero
• IQ scores (there IS a difference between 120 and 121)
• Temperature (Zero degrees does NOT mean that there is no temperature!)
4. Ratio: Same as above, except a true zero exists. True ratios exist when the same variable is measured on two different members of the population
• Weight (He’s 300lbs, she’s 150lbs, thus he’s twice as heavy as she
• Height
• Age
• Time
• Salary

Data Collection and Sampling Techniques

1. Data collection techniques

a. Direct observations (should be self explanatory)
b. Reviewing records (weather temperatures over the last 50 years, etc.)
c. Surveys -

I. Telephone:

 Advantages: cheap, people more candid w/o face-to-face contact Disadvantages: Those w/o phones, those not at home or no answer

II. Mailed questionnaire:

III. Personal interview:

 Advantages:  Can get in-depth responses Disadvantages: Interviewers must be trained (Q&A), expensive, interviewer may be biased in selection of respondents

2. Sampling Techniques

a. Random - Selected by using chance methods or random numbers

b. Systematic - Numbering each subject of the population and picking each kth number
(See page 12 for example- need 50 out of 2000, every 40th w/1st picked at random)

c. Stratified - Divide population into groups according to some characteristic, then random sampling from each group (sample from freshmen, sophomores, juniors and seniors; sample from officers and enlisted personnel, etc.)

d. Cluster - Using intact groups that are representative of a population, such as the residents in a retirement home, kids in one school, nurses in a hospital

e. Convenience - Use subjects that are convenient to study