Re: [R] Assumptions for ANOVA: the right way to check the normality

Greg Snow Tue, 11 Jan 2011 13:16:58 -0800

> From: Frodo Jedi [mailto:frodo.j...@yahoo.com] 
> Sent: Monday, January 10, 2011 5:44 PM
> To: Greg Snow
> Cc: r-help@r-project.org
> Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality
> 
> Dear Greg,
> first of all thanks for your reply. And I add also many thanks to all of you 
> guys who are helping me, sorry for the amount of questions I recently posted 
> ;-) 
> 
> I don´t have a solid statistics background (I am not a statician) and I am 
> basically learning everything by myself. 
> So my first goal is TO UNDERSTAND. I need to have general guidelines because 
> for my PhD I am doing and I will do several psycophysic experiments.
> I am totally alone in this challenge, so I am asking some help to you guys as 
> I think that here is the best place to exchange the thing that I miss
> and that will never found in any book: the experience.


Isn't there a single statistician anywhere in the University?  Does your 
committee have any experience with any of this?

> >What is the question you are really trying to find the answer for?  Knowing 
> >that may help us give more meaningful answers.
> 
> Concerning your question I thought to have been clear. I want to understand 
> which analysis I have to use in order to understand if 
> the differences I am having are statistically significant or not. Now, as in 
> all the books I read there is written that to apply ANOVA 
> I must respect the assumption of normality then I am try to find a way to 
> understand this.

A general run of anova procedures will produce multiple p-values addressing 
multiple null hypotheses addressing many different questions (often many of 
which are uninteresting).  Which terms are you really trying to test and which 
are included because you already know that they have an effect.

Are you including interactions because you find them actually interesting? Or 
just because that is what everyone else does?

[snip]
 
> >Also remember that the normality of the data/residuals/etc. is not as 
> >important as the CLT for your sample size.  The main things that make the 
> >CLT not work (for samples that are >not large enough) are outliers and 
> >strong skewness, since your outcome is limited to the numbers 1-7, I don’t 
> >see outliers or skewness being a real problem.  So you are probably >fine 
> >for fixed effects style models (though checking with experts in your area or 
> >doing simulations can support/counter this).  
> 
> As far as I have seen everyone in my field does ANOVA.

[imagine best Mom voice] and if everyone in your field jumped off a cliff . . .

Do you want to do what everyone else is doing, or something new and different?

What does your committee chair say about this?

> >But when you add in random effects then there is a lot of uncertainty about 
> >if the normal theory still holds, the latest lme code uses mcmc sampling 
> >rather than depending on >normal theory and is still being developed.
> 
> 
> For "random effects" do you mean the repeated measures right? So why 
> staticians developed the ANOVA with repeated measure if there is so much 
> uncertainty?

Repeated measures are one type of random effect analysis, but random and mixed 
effects is more general than just repeated measures.

Statisticians developed those methods because they worked for simple cases, 
made some sense for more complicated cases, and they did not have anything that 
was both better and practical.  Now with modern computers we can see when those 
do work (unfortunately not as often as had been hoped) and what was once 
impractical is now much simpler (but inertia is to do it the old way, even 
though the people who developed the old way would have preferred to do it our 
way).  The article: 

Why Permutation Tests Are Superior to t and F Tests in Biomedical Research
John Ludbrook and Hugh Dudley
The American Statistician
Vol. 52, No. 2 (May, 1998), pp. 127-132

May be enlightening here (and give possible alternatives).

Also see: 
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001819.html

for some simulation involving mixed models.  One shows that the normal theory 
works fine for that particular case, the next one shows a case where the normal 
theory does not work, then shows how to use simulation (parametric bootstrap) 
to get a more appropriate p-value.  You can adapt those examples for your own 
situation.

>  
> >This now comes back to my first question: what are you trying to find out?
> 
> My ultimate goal is to find the p-values in order to understand if my results 
> are significative or not. So I can write them on the paper ;-)

There is a function in the TeachingDemos package that will produce p-values if 
that is all your want, these are independent of any normality assumptions, 
independent of any data in fact.  However they don't really help with 
understanding.

Graphing the data (I think you have done this already) is the best route to 
understanding.  If you need more than that, then consider the following article:

     Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne,
     D.F and Wickham, H. (2009) Statistical Inference for exploratory
     data analysis and model diagnostics Phil. Trans. R. Soc. A 2009
     367, 4361-4383 doi: 10.1098/rsta.2009.0120

Some of the tests there are implemented in the vis.test function in the 
TeachingDemos package (you need to understand your null hypothesis and what you 
are testing).

>  
> >You may not need to do anova or that type of model.  Some simple hypotheses 
> >may be answered using McNemars test on your data.  If you want to do 
> >predictions then linear >models will be meaningless (what would a prediction 
> >of -3.2, 4.493, or 8.1 mean on a 7 point likert scale?) and something like 
> >proportional odds logistic regression will be much >more meaningful.  
> >Between those are bootstrap and permutation methods that may answer you 
> >question without any normality assumptions.
> 
> Ok. But my ANOVA analysis I did so far is wrong or not? I think it is very 
> valid, since the results seem coherent with what one can see 
> looking at the means.
> 

George Box is often quoted as saying: "Essentially, all models are wrong, but 
some are useful."

So the question in not if they are wrong or not, but if they are useful (some 
of the other techniques mentioned may be more useful, or what you have done may 
be useful enough).


> Thanks for sharing your precious experience with me. I think the world 
> becomes better when people help each others.
> 
> All the best
> 
 
-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111
 


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assumptions for ANOVA: the right way to check the normality

Reply via email to