Re: [R] Assumptions for ANOVA: the right way to check the normality

Frodo Jedi Tue, 11 Jan 2011 14:14:33 -0800

Many many thanks for your feedback Greg.
You have been very enlightening for me.


Now is time for me to study the material you kindly provided me. Thanks.







________________________________
From: Greg Snow <greg.s...@imail.org>

Cc: "r-help@r-project.org" <r-help@r-project.org>
Sent: Tue, January 11, 2011 10:13:34 PM
Subject: RE: [R] Assumptions for ANOVA: the right way to check the normality


> Sent: Monday, January 10, 2011 5:44 PM
> To: Greg Snow
> Cc: r-help@r-project.org
> Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality
> 
> Dear Greg,
> first of all thanks for your reply. And I add also many thanks to all of you 
>guys who are helping me, sorry for the amount of questions I recently posted 
>;-) 
>
> 
> I donÂ´t have a solid statistics background (I am not a statician) and I am 
>basically learning everything by myself. 
>
> So my first goal is TO UNDERSTAND. I need to have general guidelines because 
>for my PhD I am doing and I will do several psycophysic experiments.
> I am totally alone in this challenge, so I am asking some help to you guys as 
> I 
>think that here is the best place to exchange the thing that I miss
> and that will never found in any book: the experience.

Isn't there a single statistician anywhere in the University?  Does your 
committee have any experience with any of this?

> >What is the question you are really trying to find the answer for?  Knowing 
>that may help us give more meaningful answers.
> 
> Concerning your question I thought to have been clear. I want to understand 
>which analysis I have to use in order to understand if 
>
> the differences I am having are statistically significant or not. Now, as in 
>all the books I read there is written that to apply ANOVA 
>
> I must respect the assumption of normality then I am try to find a way to
>understand this.

A general run of anova procedures will produce multiple p-values addressing
multiple null hypotheses addressing many different questions (often many of
which are uninteresting).  Which terms are you really trying to test and which 
are included because you already know that they have an effect.

Are you including interactions because you find them actually interesting? Or 
just because that is what everyone else does?

[snip]

> >Also remember that the normality of the data/residuals/etc. is not as 
>important as the CLT for your sample size.  The main things that make the CLT 
>not work (for samples that are >not large enough) are outliers and strong
>skewness, since your outcome is limited to the numbers 1-7, I donât see 
>outliers 
>or skewness being a real problem.  So you are probably >fine for fixed effects 
>style models (though checking with experts in your area or doing simulations 
>can 
>support/counter this).  
>
> 
> As far as I have seen everyone in my field does ANOVA.

[imagine best Mom voice] and if everyone in your field jumped off a cliff . . .

Do you want to do what everyone else is doing, or something new and different?

What does your committee chair say about this?

> >But when you add in random effects then there is a lot of uncertainty about 
> >if 
>the normal theory still holds, the latest lme code uses mcmc sampling rather 
>than depending on >normal theory and is still being developed.
> 
> 
> For "random effects" do you mean the repeated measures right? So why 
> staticians 
>developed the ANOVA with repeated measure if there is so much uncertainty?

Repeated measures are one type of random effect analysis, but random and mixed 
effects is more general than just repeated measures.

Statisticians developed those methods because they worked for simple cases, 
made 
some sense for more complicated cases, and they did not have anything that was 
both better and practical.  Now with modern computers we can see when those do 
work (unfortunately not as often as had been hoped) and what was once 
impractical is now much simpler (but inertia is to do it the old way, even
though the people who developed the old way would have preferred to do it our 
way).  The article: 


Why Permutation Tests Are Superior to t and F Tests in Biomedical Research
John Ludbrook and Hugh Dudley
The American Statistician
Vol. 52, No. 2 (May, 1998), pp. 127-132

May be enlightening here (and give possible alternatives).

Also see: 
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001819.html 

for some simulation involving mixed models.  One shows that the normal theory 
works fine for that particular case, the next one shows a case where the normal 
theory does not work, then shows how to use simulation (parametric bootstrap) 
to 
get a more appropriate p-value.  You can adapt those examples for your own
situation.

>  
> >This now comes back to my first question: what are you trying to find out?
> 
> My ultimate goal is to find the p-values in order to understand if my results 
>are significative or not. So I can write them on the paper ;-)

There is a function in the TeachingDemos package that will produce p-values if 
that is all your want, these are independent of any normality assumptions,
independent of any data in fact.  However they don't really help with 
understanding.

Graphing the data (I think you have done this already) is the best route to
understanding.  If you need more than that, then consider the following article:

     Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne,
     D.F and Wickham, H. (2009) Statistical Inference for exploratory
     data analysis and model diagnostics Phil. Trans. R. Soc. A 2009
     367, 4361-4383 doi: 10.1098/rsta.2009.0120

Some of the tests there are implemented in the vis.test function in the 
TeachingDemos package (you need to understand your null hypothesis and what you 
are testing).

>  
> >You may not need to do anova or that type of model.  Some simple hypotheses 
>may be answered using McNemars test on your data.  If you want to do 
>predictions 
>then linear >models will be meaningless (what would a prediction of -3.2, 
>4.493, 
>or 8.1 mean on a 7 point likert scale?) and something like proportional odds 
>logistic regression will be much >more meaningful.  Between those are 
>bootstrap 
>and permutation methods that may answer you question without any normality
>assumptions.
> 
> Ok. But my ANOVA analysis I did so far is wrong or not? I think it is very 
>valid, since the results seem coherent with what one can see 
>
> looking at the means.
> 

George Box is often quoted as saying: "Essentially, all models are wrong, but 
some are useful."

So the question in not if they are wrong or not, but if they are useful (some 
of 
the other techniques mentioned may be more useful, or what you have done may be 
useful enough).


> Thanks for sharing your precious experience with me. I think the world 
> becomes 
>better when people help each others.
> 
> All the best
> 

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


      
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Assumptions for ANOVA: the right way to check the normality

Reply via email to