I thought your question was well expressed and that you followed the posting guide better than most.
I'm no expert on such issues, but I'd like to kick in a few opinions (with which others may disagree). (1) All of the anova stuff is based on the assumption of homogeneity of variance. However my understanding is that the model is quite robust to this assumption. Problems may arise if there are small sample sizes in some cells and if the small samples are associated with large variances. Otherwise there is not all that much of a worry. (2) The Tukey test is indeed based on the assumption of equal sample sizes. The version of the test for unbalanced data is an approximation. My understanding is that it's a pretty good approximation. (3) For multiple comparisons after applying the Kruskal-Wallis test: Experts on non-parametric statistics may know about more powerful methods, but I would be inclined simply to apply a Bonferroni correction to a collection of pairwise tests (e.g. wilcox.test). Just multiply the p- values by the number of pairwise comparisons, k-choose-2 where k is the number of groups (= 3-choose-2 = 3 in your toy example). (4) Generally speaking I would say that if a classical test and a non- parametric test give different answers, then your data are being coy about revealing their true import. I would have very little faith in either answer, and would claim that you really need more data. Unfortunately this need can rarely be satisfied. If you have to make a decision one way or another, then you should go with the non- parametric answer. (5) Finally, my universal prescription is: ``When in doubt, simulate.'' I.e. simulate multiple data sets on the basis of models fitted to, or related to, your real data. Run the possible tests on the simulated data sets. Since these data are simulated, you know what the right answer is. Count up how often you get the right answer. Such an exercise can be quite revealing. HTH cheers, Rolf Turner On 13/03/2008, at 9:19 AM, eugen pircalabelu wrote: > Hi, > > My data was only a toy example that matched the real situation, > with real data, but i could not have posted the entire data.set and > so i gave a self contained example of what i thought was my > problem. Of course you can see with the naked eye that the data is > unbalanced, (this was done intentionally) but like i said this was > only a toy example, mimicking a problem from a real data set. > > Thank you and have a great ahead! > > > David Hewitt <[EMAIL PROTECTED]> wrote: > > >> I have the following problem: how appropriate is my aov model >> under the >> violation of anova assumptions? >> >> Example: >> a<-c(1,1,1,1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3,3) >> b<-c(101,1010,200,300,400, 202, 121, 234, 55,555,66,76,88,34,239, >> 30, 40, >> 50,50,60) >> z<-data.frame(a, b) >> fligner.test(z$b, factor(z$a)) >> aov(z$b~factor(z$a))->ll >> TukeyHSD(ll) >> >> Now from the aov i found that my model is unbalanced, and from the >> flinger test i found out that the assumption of homogeneity of >> variances >> is rejected. Could my Tukey comparison be a valid one under these >> violations? From what i read the Tukey test is valid only when the >> model >> is balanced and when the assumption of homogeneity of variances is >> not >> rejected, am i wrong? Can anyone tell me what would be the correct >> test in >> this case? >> >> Doing a non-parametric Kruskal - wallis test would give me a >> different >> result. But what would be the correct multiple comparison test in >> this >> case? >> > > You shouldn't have needed aov to tell you that the data (not the > model) are > unbalanced. I could see that without running the code! Seriously, > you might > need to think more about the type of model you're using, and what > you want > to know, and then consider how to estimate the effect sizes of > interest. > > > ----- > David Hewitt > Virginia Institute of Marine Science > http://www.vims.edu/fish/students/dhewitt/ > -- > View this message in context: http://www.nabble.com/question-for- > aov-and-kruskal-tp15955385p15976643.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > --------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.