-----Original Message-----
From: [email protected] [mailto:r-help-boun...@r-
project.org] On Behalf Of Ben Ward
Sent: Thursday, January 06, 2011 2:00 PM
To: [email protected]
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality
On 06/01/2011 20:29, Greg Snow wrote:
Some would argue to always use the kruskal wallis test since we never
know for sure if we have normality. Personally I am not sure that I
understand what exactly that test is really testing. Plus in your case
you are doing a two-way anova and kruskal.test does one-way, so it will
not work for your case. There are other non-parametric options.
Just read this and had queries of my own and comments on this subject:
Would one of these options be to rank the data before doing whatever
model or test you want to do? As I understand it makes the place of the
data the same, but pulls extreme cases closer to the rest. Not an
expert
though.
I've been doing lm() for my work, and I don't know if that makes an
assumption of normality (may data is not normal). And I'm unsure of any
other assumptions as my texts don't really discuss them. Although I can
comfortably evaluate a model say using residual vs fitted, and F values
turned to P, resampling and confidence intervals, and looking at sums
of
squares terms add to explanation of the model. I've tried the plot()
function to help graphically evaluate a model, and I want to make sure
I
understand what it's showing me. I think the first, is showing me the
models fitted values vs the residuals, and ideally, I think the closer
the points are to the red line the better. The next plot is a Q-Q plot,
the closer the points to the line, the more normal the model
coefficients (or perhaps the data). I'm not sure what the next two
plots
are, but it is titled Scale-Location. And it looks to have the square
root of standardized residuals on y, and fitted model values on x.
Might
this be similar to the first plot? The final one is titled Residuals vs
Leverage, which has standardized residuals on y and leverage on x, and
something called Cooks Distance is plotted as well.
Thanks,
Ben. W
Whether to use anova and other normality based tests is really a
matter of what assumptions you are willing to live with and what level
of "close enough" you are comfortable with. Consulting with a local
consultant with experience in these areas is useful if you don't have
enough experience to decide what you are comfortable with.
For your description, I would try the proportional odds logistic
regression, but again, you should probably consult with someone who has
experience rather than trying that on your own until you have more
training and experience.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[email protected]
801.408.8111
From: Frodo Jedi [mailto:[email protected]]
Sent: Thursday, January 06, 2011 12:57 PM
To: Greg Snow; [email protected]
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality
Ok,
I see ;-)
Let´s put in this way then. When do I have to use the kruskal wallis
test? I mean, when I am very sure that I have
to use it instead of ANOVA?
Thanks
Best regards
P.S. In addition, which is the non parametric methods corresponding
to a 2 ways anova?..or have I to
repeat many times the kruskal wallis test?
________________________________
From: Greg Snow<[email protected]>
To: Frodo Jedi<[email protected]>; Robert Baer<[email protected]>;
"[email protected]"<[email protected]>
Sent: Thu, January 6, 2011 7:07:17 PM
Subject: RE: [R] Assumptions for ANOVA: the right way to check the
normality
Remember that an non-significant result (especially one that is still
near alpha like yours) does not give evidence that the null is true.
The reason that the 1st 2 tests below don't show significance is more
due to lack of power than some of the residuals being normal. The only
test that I would trust for this is SnowsPenultimateNormalityTest
(TeachingDemos package, the help page is more useful than the function
itself).
But I think that you are mixing up 2 different concepts (a very
common misunderstanding). What is important if we want to do normal
theory inference is that the coefficients/effects/estimates are
normally distributed. Now since these coefficients can be shown to be
linear combinations of the error terms, if the errors are iid normal
then the coefficients are also normally distributed. So many people
want to show that the residuals come from a perfectly normal
distribution. But it is the theoretical errors, not the observed
residuals that are important (the observed residuals are not iid). You
need to think about the source of your data to see if this is a
reasonable assumption. Now I cannot fathom any universe (theoretical
or real) in which normally distributed errors added to means that they
are independent of will result in a finite set of integers, so an
assumption of exact normality is not reasonable (some may want to argue
this, but convincing me will be very difficult). But looking for exact
normality is a bit of a red herring because, we also have the Central
Limit Theorem that says that if the errors are not normal (but still
iid) then the distribution of the coefficients will approach normality
as the sample size increases. This is what make statistics doable
(because no real dataset entered into the computer is exactly normal).
The more important question is are the residuals "normal enough"? for
which there is not a definitive test (experience and plots help).
But this all depends on another assumption that I don't think that
you have even considered. Yes we can use normal theory even when the
random part of the data is not normally distributed, but this still
assumes that the data is at least interval data, i.e. that we firmly
believe that the difference between a response of 1 and a response of 2
is exactly the same as a difference between a 6 and a 7 and that the
difference from 4 to 6 is exactly twice that of 1 vs. 2. From your
data and other descriptions, I don't think that that is a reasonable
assumption. If you are not willing to make that assumption (like me)
then means and normal theory tests are meaningless and you should use
other approaches. One possibility is to use non-parametric methods
(which I believe Frank has already suggested you use), another is to
use proportional odds logistic regression.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[email protected]<mailto:[email protected]>
801.408.8111
-----Original Message-----
From: [email protected]<mailto:r-help-boun...@r-
project.org> [mailto:r-help-boun...@r-
project.org<http://project.org>] On Behalf Of Frodo Jedi
Sent: Wednesday, January 05, 2011 3:22 PM
To: Robert Baer; [email protected]<mailto:[email protected]>
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality
Dear Robert,
thanks so much!!! Now I understand!
So you also think that I have to check only the residuals and not
the
data
directly.
Now just for curiosity I did the the shapiro test on the residuals.
The
problem
is that on fit3 I don´t get from the test
that the data are normally distribuited. Why? Here the data:
shapiro.test(residuals(fit1))
Shapiro-Wilk normality test
data: residuals(fit1)
W = 0.9848, p-value = 0.05693
#Here the test is ok: the test says that the data are distributed
normally
(p-value greather than 0.05)
shapiro.test(residuals(fit2))
Shapiro-Wilk normality test
data: residuals(fit2)
W = 0.9853, p-value = 0.06525
#Here the test is ok: the test says that the data are distributed
normally
(p-value greather than 0.05)
shapiro.test(residuals(fit3))
Shapiro-Wilk normality test
data: residuals(fit3)
W = 0.9621, p-value = 0.0001206
Now the test reveals p-value lower than 0.05: so the residuals for
fit3
are not
distributed normally....
Why I get this beheaviour? Indeed in the histogram and Q-Q plot for
fit3
residuals I get a normal distribution.
________________________________
From: Robert Baer<[email protected]<mailto:[email protected]>>
Sent: Wed, January 5, 2011 8:56:50 PM
Subject: Re: [R] Assumptions for ANOVA: the right way to check the
normality
Someone suggested me that I don´t have to check the normality of
the
data, but
the normality of the residuals I get after the fitting of the
linear
model.
I really ask you to help me to understand this point as I don´t
find
enough
material online where to solve it.
Try the following:
# using your scrd data and your proposed models
fit1<- lm(response ~ stimulus + condition + stimulus:condition,
data=scrd)
fit2<- lm(response ~ stimulus + condition, data=scrd)
fit3<- lm(response ~ condition, data=scrd)
# Set up for 6 plots on 1 panel
op = par(mfrow=c(2,3))
# residuals function extracts residuals
# Visual inspection is a good start for checking normality
# You get a much better feel than from some "magic number" statistic
hist(residuals(fit1))
hist(residuals(fit2))
hist(residuals(fit3))
# especially qqnorm() plots which are linear for normal data
qqnorm(residuals(fit1))
qqnorm(residuals(fit2))
qqnorm(residuals(fit3))
# Restore plot parameters
par(op)
If the data are not normally distributed I have to use the kruskal
wallys test
and not the ANOVA...so please help
me to understand.
Indeed - Kruskal-Wallis is a good test to use for one factor data
that
is
ordinal so it is a good alternative to your fit3.
Your "response" seems to be a discrete variable rather than a
continuous
variable.
You must decide if it is reasonable to approximate it with a normal
distribution
which is by definition continuous.
I make a numerical example, could you please tell me if the data in
this table
are normally distributed or not?
Help!
number stimulus condition response
1 flat_550_W_realism A 3
2 flat_550_W_realism A 3
3 flat_550_W_realism A 5
4 flat_550_W_realism A 3
5 flat_550_W_realism A 3
6 flat_550_W_realism A 3
7 flat_550_W_realism A 3
8 flat_550_W_realism A 5
9 flat_550_W_realism A 3
10 flat_550_W_realism A 3
11 flat_550_W_realism A 5
12 flat_550_W_realism A 7
13 flat_550_W_realism A 5
14 flat_550_W_realism A 2
15 flat_550_W_realism A 3
16 flat_550_W_realism AH 7
17 flat_550_W_realism AH 4
18 flat_550_W_realism AH 5
19 flat_550_W_realism AH 3
20 flat_550_W_realism AH 6
21 flat_550_W_realism AH 5
22 flat_550_W_realism AH 3
23 flat_550_W_realism AH 5
24 flat_550_W_realism AH 5
25 flat_550_W_realism AH 7
26 flat_550_W_realism AH 2
27 flat_550_W_realism AH 7
28 flat_550_W_realism AH 5
29 flat_550_W_realism AH 5
30 bump_2_step_W_realism A 1
31 bump_2_step_W_realism A 3
32 bump_2_step_W_realism A 5
33 bump_2_step_W_realism A 1
34 bump_2_step_W_realism A 3
35 bump_2_step_W_realism A 2
36 bump_2_step_W_realism A 5
37 bump_2_step_W_realism A 4
38 bump_2_step_W_realism A 4
39 bump_2_step_W_realism A 4
40 bump_2_step_W_realism A 4
41 bump_2_step_W_realism AH 3
42 bump_2_step_W_realism AH 5
43 bump_2_step_W_realism AH 1
44 bump_2_step_W_realism AH 5
45 bump_2_step_W_realism AH 4
46 bump_2_step_W_realism AH 4
47 bump_2_step_W_realism AH 5
48 bump_2_step_W_realism AH 4
49 bump_2_step_W_realism AH 3
50 bump_2_step_W_realism AH 4
51 bump_2_step_W_realism AH 5
52 bump_2_step_W_realism AH 4
53 hole_2_step_W_realism A 3
54 hole_2_step_W_realism A 3
55 hole_2_step_W_realism A 4
56 hole_2_step_W_realism A 1
57 hole_2_step_W_realism A 4
58 hole_2_step_W_realism A 3
59 hole_2_step_W_realism A 5
60 hole_2_step_W_realism A 4
61 hole_2_step_W_realism A 3
62 hole_2_step_W_realism A 4
63 hole_2_step_W_realism A 7
64 hole_2_step_W_realism A 5
65 hole_2_step_W_realism A 1
66 hole_2_step_W_realism A 4
67 hole_2_step_W_realism AH 7
68 hole_2_step_W_realism AH 5
69 hole_2_step_W_realism AH 5
70 hole_2_step_W_realism AH 1
71 hole_2_step_W_realism AH 5
72 hole_2_step_W_realism AH 5
73 hole_2_step_W_realism AH 5
74 hole_2_step_W_realism AH 2
75 hole_2_step_W_realism AH 6
76 hole_2_step_W_realism AH 5
77 hole_2_step_W_realism AH 5
78 hole_2_step_W_realism AH 6
79 bump_2_heel_toe_W_realism A 3
80 bump_2_heel_toe_W_realism A 3
81 bump_2_heel_toe_W_realism A 3
82 bump_2_heel_toe_W_realism A 2
83 bump_2_heel_toe_W_realism A 3
84 bump_2_heel_toe_W_realism A 3
85 bump_2_heel_toe_W_realism A 4
86 bump_2_heel_toe_W_realism A 3
87 bump_2_heel_toe_W_realism A 4
88 bump_2_heel_toe_W_realism A 4
89 bump_2_heel_toe_W_realism A 6
90 bump_2_heel_toe_W_realism A 5
91 bump_2_heel_toe_W_realism A 4
92 bump_2_heel_toe_W_realism AH 7
93 bump_2_heel_toe_W_realism AH 3
94 bump_2_heel_toe_W_realism AH 4
95 bump_2_heel_toe_W_realism AH 2
96 bump_2_heel_toe_W_realism AH 5
97 bump_2_heel_toe_W_realism AH 6
98 bump_2_heel_toe_W_realism AH 4
99 bump_2_heel_toe_W_realism AH 4
100 bump_2_heel_toe_W_realism AH 4
101 bump_2_heel_toe_W_realism AH 5
102 bump_2_heel_toe_W_realism AH 2
103 bump_2_heel_toe_W_realism AH 6
104 bump_2_heel_toe_W_realism AH 5
105 hole_2_heel_toe_W_realism A 3
106 hole_2_heel_toe_W_realism A 3
107 hole_2_heel_toe_W_realism A 1
108 hole_2_heel_toe_W_realism A 3
109 hole_2_heel_toe_W_realism A 3
110 hole_2_heel_toe_W_realism A 5
111 hole_2_heel_toe_W_realism A 2
112 hole_2_heel_toe_W_realism AH 5
113 hole_2_heel_toe_W_realism AH 1
114 hole_2_heel_toe_W_realism AH 3
115 hole_2_heel_toe_W_realism AH 6
116 hole_2_heel_toe_W_realism AH 5
117 hole_2_heel_toe_W_realism AH 4
118 hole_2_heel_toe_W_realism AH 4
119 hole_2_heel_toe_W_realism AH 3
120 hole_2_heel_toe_W_realism AH 3
121 hole_2_heel_toe_W_realism AH 1
122 hole_2_heel_toe_W_realism AH 5
123 bump_2_combination_W_realism A 4
124 bump_2_combination_W_realism A 2
125 bump_2_combination_W_realism A 4
126 bump_2_combination_W_realism A 1
127 bump_2_combination_W_realism A 4
128 bump_2_combination_W_realism A 4
129 bump_2_combination_W_realism A 2
130 bump_2_combination_W_realism A 4
131 bump_2_combination_W_realism A 2
132 bump_2_combination_W_realism A 4
133 bump_2_combination_W_realism A 2
134 bump_2_combination_W_realism A 6
135 bump_2_combination_W_realism AH 7
136 bump_2_combination_W_realism AH 3
137 bump_2_combination_W_realism AH 4
138 bump_2_combination_W_realism AH 1
139 bump_2_combination_W_realism AH 6
140 bump_2_combination_W_realism AH 5
141 bump_2_combination_W_realism AH 5
142 bump_2_combination_W_realism AH 6
143 bump_2_combination_W_realism AH 5
144 bump_2_combination_W_realism AH 4
145 bump_2_combination_W_realism AH 2
146 bump_2_combination_W_realism AH 4
147 bump_2_combination_W_realism AH 2
148 bump_2_combination_W_realism AH 5
149 hole_2_combination_W_realism A 5
150 hole_2_combination_W_realism A 2
151 hole_2_combination_W_realism A 4
152 hole_2_combination_W_realism A 1
153 hole_2_combination_W_realism A 5
154 hole_2_combination_W_realism A 4
155 hole_2_combination_W_realism A 3
156 hole_2_combination_W_realism A 5
157 hole_2_combination_W_realism A 2
158 hole_2_combination_W_realism A 5
159 hole_2_combination_W_realism A 5
160 hole_2_combination_W_realism A 1
161 hole_2_combination_W_realism AH 7
162 hole_2_combination_W_realism AH 5
163 hole_2_combination_W_realism AH 3
164 hole_2_combination_W_realism AH 1
165 hole_2_combination_W_realism AH 6
166 hole_2_combination_W_realism AH 4
167 hole_2_combination_W_realism AH 7
168 hole_2_combination_W_realism AH 5
169 hole_2_combination_W_realism AH 5
170 hole_2_combination_W_realism AH 2
171 hole_2_combination_W_realism AH 6
172 hole_2_combination_W_realism AH 2
173 hole_2_combination_W_realism AH 4
Thanks in advance
[[alternative HTML version deleted]]
______________________________________________
[email protected]<mailto:[email protected]> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]