Experimental Design and Analysis free pdf ebook was written by on November 01, 2009 consist of 14 page(s). The pdf file is provided by www.stat.cmu.edu and available on pdfpedia since December 12, 2011.

preface this book is intended as required reading material for my course, experimen-taldesignforthe behavioral and social sciences, a second level statistics course ...

Experimental Design and
Analysis
Howard J. Seltman
November 1, 2009
You're reading the first 10 out of 14 pages of this docs, please download or login to readmore.

Preface
This book is intended as required reading material for my course, Experimen-
tal Design for the Behavioral and Social Sciences, a second level statistics course
for undergraduate students in the College of Humanities and Social Sciences at
Carnegie Mellon University. This course is also cross-listed as a graduate level
course for Masters and PhD students (in ﬁelds other than Statistics), and supple-
mentary material is included for this level of study.
Over the years the course has grown to include students from dozens of majors
beyond Psychology and the Social Sciences and from all of the Colleges of the
University. This is appropriate because Experimental Design is fundamentally the
same for all ﬁelds. This book tends towards examples from behavioral and social
sciences, but includes a full range of examples.
In truth, a better title for the course is Experimental Design and Analysis,
and that is the title of this book. Experimental Design and Statistical Analysis
go hand in hand, and neither can be understood without the other. Only a small
fraction of the myriad statistical analytic methods are covered in this book, but
my rough guess is that these methods cover 60%-80% of what you will read in
the literature and what is needed for analysis of your own experiments. In other
words, I am guessing that the ﬁrst 10% of all methods available are applicable to
about 80% of analyses. Of course, it is well known that 87% of statisticians make
up probabilities on the spot when they don’t know the true values. :)
Real examples are usually better than contrived ones, but real experimental
data is of limited availability. Therefore, in addition to some contrived examples
and some real examples, the majority of the examples in this book are based on
simulation of data designed to match real experiments.
I need to say a few things about the
diﬃculties of learning
about experi-
mental design and analysis. A practical working knowledge requires understanding
many concepts and their relationships. Luckily much of what you need to learn
agrees with common sense, once you sort out the terminology. On the other hand,
there is no ideal logical order for learning what you need to know, because every-
thing relates to, and in some ways depends on, everything else. So be aware: many
concepts are only loosely deﬁned when ﬁrst mentioned, then further clariﬁed later
when you have been introduced to other related material. Please try not to get
frustrated with some incomplete knowledge as the course progresses. If you work
hard, everything should tie together by the end of the course.
ii
In that light, I recommend that you create your own “concept maps” as the
course progresses. A concept map is usually drawn as a set of ovals with the names
of various concepts written inside and with arrows showing relationships among
the concepts. Often it helps to label the arrows. Concept maps are a great learning
tool that help almost every student who tries them. They are particularly useful
for a course like this for which the main goal is to learn the relationships among
many concepts so that you can learn to carry out speciﬁc tasks (design and analysis
in this case). A second best alternative to making your own concept maps is to
further annotate the ones that I include in this text.
This book is on the world wide web at
http://www.stat.cmu.edu/∼hseltman/309/Book/Book.pdf
and any associated data
ﬁles are at
http://www.stat.cmu.edu/∼hseltman/309/Book/data/.
One key idea in this course is that you cannot really learn statistics without
doing statistics. Even if you will never analyze data again, the hands-on expe-
rience you will gain from analyzing data in labs, homework and exams will take
your understanding of and ability to read about other peoples experiments and
data analyses to a whole new level. I don’t think it makes much diﬀerence which
statistical package you use for your analyses, but for practical reasons we must
standardize on a particular package in this course, and that is SPSS, mostly be-
cause it is one of the packages most likely to be available to you in your future
schooling and work. You will ﬁnd a chapter on learning to use SPSS in this book.
In addition, many of the other chapters end with “How to do it in SPSS” sections.
There are some typographical conventions you should know about. First, in a
non-standard way, I use capitalized versions of Normal and Normality because I
don’t want you to think that the Normal distribution has anything to do with the
ordinary conversational meaning of “normal”.
Another convention is that optional material has a gray background:
I have tried to use only the minimally required theory and mathematics
for a reasonable understanding of the material, but many students want
a deeper understanding of what they are doing statistically. Therefore
material in a gray box like this one should be considered optional extra
theory and/or math.
iii
Periodically I will summarize key points (i.e., that which is roughly suﬃcient
to achieve a B in the course) in a box:
Key points are in boxes. They may be useful at review time to help
you decide which parts of the material you know well and which you
should re-read.
Less often I will sum up a larger topic to make sure you haven’t “lost the forest
for the trees”. These are double boxed and start with “In a nutshell”:
In a nutshell: You can make better use of the text by paying attention
to the typographical conventions.
Chapter 1 is an overview of what you should expect to learn in this course.
Chapters 2 through 4 are a review of what you should have learned in a previous
course. Depending on how much you remember, you should skim it or read through
it carefully. Chapter 5 is a quick start to SPSS. Chapter 6 presents the statisti-
cal foundations of experimental design and analysis in the case of a very simple
experiment, with emphasis on the theory that needs to be understood to use statis-
tics appropriately in practice. Chapter 7 covers experimental design principles in
terms of preventable threats to the acceptability of your experimental conclusions.
Most of the remainder of the book discusses speciﬁc experimental designs and
corresponding analyses, with continued emphasis on appropriate design, analysis
and interpretation. Special emphasis chapters include those on power, multiple
comparisons, and model selection.
You may be interested in my background. I obtained my M.D. in 1979 and prac-
ticed clinical pathology for 15 years before returning to school to obtain my PhD in
Statistics in 1999. As an undergraduate and as an academic pathologist, I carried
iv
out my own experiments and analyzed the results of other people’s experiments in
a wide variety of settings. My hands on experience ranges from techniques such
as cell culture, electron auto-radiography, gas chromatography-mass spectrome-
try, and determination of cellular enzyme levels to topics such as evaluating new
radioimmunoassays, determining predictors of success in in-vitro fertilization and
evaluating the quality of care in clinics vs. doctor’s oﬃces, to name a few. Many
of my opinions and hints about the actual conduct of experiments come from these
experiences.
As an Associate Research Professor in Statistics, I continue to analyze data for
many diﬀerent clients as well as trying to expand the frontiers of statistics. I have
also tried hard to understand the spectrum of causes of confusion in students as I
have taught this course repeatedly over the years. I hope that this experience will
beneﬁt you. I know that I continue to greatly enjoy teaching, and I am continuing
to learn from my students.
Howard Seltman
August 2008
Contents
1 The Big Picture
1.1
1.2
1.3
The importance of careful experimental design
. . . . . . . . . . . .
Overview of statistical analysis
. . . . . . . . . . . . . . . . . . . .
What you should learn here
. . . . . . . . . . . . . . . . . . . . . .
1
3
3
6
9
10
11
12
16
19
19
24
27
28
34
35
37
38
39
2 Variable Classiﬁcation
2.1
2.2
2.3
2.4
What makes a “good” variable?
. . . . . . . . . . . . . . . . . . . .
Classiﬁcation by role
. . . . . . . . . . . . . . . . . . . . . . . . . .
Classiﬁcation by statistical type
. . . . . . . . . . . . . . . . . . . .
Tricky cases
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 Review of Probability
3.1
3.2
3.3
3.4
3.5
Deﬁnition(s) of probability
. . . . . . . . . . . . . . . . . . . . . . .
Probability mass functions and density functions
. . . . . . . . . . .
3.2.1
Reading a pdf
. . . . . . . . . . . . . . . . . . . . . . . . . .
Probability calculations
. . . . . . . . . . . . . . . . . . . . . . . . .
Populations and samples
. . . . . . . . . . . . . . . . . . . . . . . .
Parameters describing distributions
. . . . . . . . . . . . . . . . . .
3.5.1
3.5.2
3.5.3
Central tendency: mean and median
. . . . . . . . . . . . .
Spread: variance and standard deviation
. . . . . . . . . . .
Skewness and kurtosis
. . . . . . . . . . . . . . . . . . . . .
v
vi
3.5.4
3.5.5
3.6
3.7
3.8
3.9
3.6.1
CONTENTS
Miscellaneous comments on distribution parameters
. . . . .
Examples
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Covariance and Correlation
. . . . . . . . . . . . . . . . . .
39
40
42
46
50
52
54
54
56
57
57
59
59
60
61
61
63
63
64
67
69
71
72
72
78
79
83
Multivariate distributions: joint, conditional, and marginal
. . . . .
Key application: sampling distributions
. . . . . . . . . . . . . . . .
Central limit theorem
. . . . . . . . . . . . . . . . . . . . . . . . . .
Common distributions
. . . . . . . . . . . . . . . . . . . . . . . . .
3.9.1
3.9.2
3.9.3
3.9.4
3.9.5
3.9.6
3.9.7
Binomial distribution
. . . . . . . . . . . . . . . . . . . . . .
Multinomial distribution
. . . . . . . . . . . . . . . . . . . .
Poisson distribution
. . . . . . . . . . . . . . . . . . . . . . .
Gaussian distribution
. . . . . . . . . . . . . . . . . . . . . .
t-distribution
. . . . . . . . . . . . . . . . . . . . . . . . . .
Chi-square distribution
. . . . . . . . . . . . . . . . . . . . .
F-distribution
. . . . . . . . . . . . . . . . . . . . . . . . . .
4 Exploratory Data Analysis
4.1
4.2
Typical data format and the types of EDA
. . . . . . . . . . . . . .
Univariate non-graphical EDA
. . . . . . . . . . . . . . . . . . . . .
4.2.1
4.2.2
4.2.3
4.2.4
4.2.5
4.3
4.3.1
4.3.2
4.3.3
4.3.4
Categorical data
. . . . . . . . . . . . . . . . . . . . . . . .
Characteristics of quantitative data
. . . . . . . . . . . . . .
Central tendency
. . . . . . . . . . . . . . . . . . . . . . . .
Spread
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Skewness and kurtosis
. . . . . . . . . . . . . . . . . . . . .
Histograms
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Stem-and-leaf plots
. . . . . . . . . . . . . . . . . . . . . . .
Boxplots
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Quantile-normal plots
. . . . . . . . . . . . . . . . . . . . .
Univariate graphical EDA
. . . . . . . . . . . . . . . . . . . . . . .
CONTENTS
4.4
Multivariate non-graphical EDA
. . . . . . . . . . . . . . . . . . . .
4.4.1
4.4.2
4.4.3
4.4.4
4.4.5
4.5
4.5.1
4.5.2
4.6
Cross-tabulation
. . . . . . . . . . . . . . . . . . . . . . . .
Correlation for categorical data
. . . . . . . . . . . . . . . .
Univariate statistics by category
. . . . . . . . . . . . . . . .
Correlation and covariance
. . . . . . . . . . . . . . . . . . .
Covariance and correlation matrices
. . . . . . . . . . . . . .
Univariate graphs by category
. . . . . . . . . . . . . . . . .
Scatterplots
. . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
88
89
90
91
91
93
94
95
95
98
101
Multivariate graphical EDA
. . . . . . . . . . . . . . . . . . . . . .
A note on degrees of freedom
. . . . . . . . . . . . . . . . . . . . .
5 Learning SPSS: Data and EDA
5.1
5.2
5.3
5.4
5.5
Overview of SPSS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Starting SPSS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Typing in data
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Loading data
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Creating new variables
. . . . . . . . . . . . . . . . . . . . . . . . . 116
5.5.1
5.5.2
5.5.3
Recoding
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Automatic recoding
. . . . . . . . . . . . . . . . . . . . . . . 120
Visual binning
. . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.6
5.7
Non-graphical EDA
. . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Graphical EDA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.7.1
5.7.2
5.7.3
5.7.4
Overview of SPSS Graphs
. . . . . . . . . . . . . . . . . . . 127
Histogram
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Boxplot
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Scatterplot
. . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.8
SPSS convenience item: Explore
. . . . . . . . . . . . . . . . . . . . 139
141
6 t-test
viii
6.1
6.2
CONTENTS
Case study from the ﬁeld of Human-Computer Interaction (HCI)
. . 143
How classical statistical inference works
. . . . . . . . . . . . . . . . 147
6.2.1
6.2.2
6.2.3
6.2.4
6.2.5
6.2.6
6.2.7
6.2.8
6.2.9
6.3
6.4
The steps of statistical analysis
. . . . . . . . . . . . . . . . 148
Model and parameter deﬁnition
. . . . . . . . . . . . . . . . 149
Null and alternative hypotheses
. . . . . . . . . . . . . . . . 152
Choosing a statistic
. . . . . . . . . . . . . . . . . . . . . . . 153
Computing the null sampling distribution
. . . . . . . . . . 154
Finding the p-value
. . . . . . . . . . . . . . . . . . . . . . . 155
Conﬁdence intervals
. . . . . . . . . . . . . . . . . . . . . . 159
Assumption checking
. . . . . . . . . . . . . . . . . . . . . . 161
Subject matter conclusions
. . . . . . . . . . . . . . . . . . . 163
6.2.10 Power
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Do it in SPSS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Return to the HCI example
. . . . . . . . . . . . . . . . . . . . . . 165
171
7 One-way ANOVA
7.1
7.2
Moral Sentiment Example
. . . . . . . . . . . . . . . . . . . . . . . 172
How one-way ANOVA works
. . . . . . . . . . . . . . . . . . . . . . 176
7.2.1
7.2.2
7.2.3
7.2.4
7.2.5
The model and statistical hypotheses
. . . . . . . . . . . . . 176
The F statistic (ratio)
. . . . . . . . . . . . . . . . . . . . . 178
Null sampling distribution of the F statistic
. . . . . . . . . 182
Inference: hypothesis testing
. . . . . . . . . . . . . . . . . . 184
Inference: conﬁdence intervals
. . . . . . . . . . . . . . . . . 186
7.3
7.4
7.5
7.6
Do it in SPSS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Reading the ANOVA table
. . . . . . . . . . . . . . . . . . . . . . . 187
Assumption checking
. . . . . . . . . . . . . . . . . . . . . . . . . . 189
Conclusion about moral sentiments
. . . . . . . . . . . . . . . . . . 189
191
8 Threats to Your Experiment
You're reading the first 10 out of 14 pages of this docs, please download or login to readmore.