Le mercredi 29 août 2012 à 07:37 -0700, andyspeak a écrit : > Hello > I have a huge data frame with three columns 'Roof' 'Month' and 'Temp' > i want to run analyses on the numerical Temp data by the factors Roof and > Month, separately and together. > For using more than one factor i understand i should use aggregate, but i am > struggling with the tapply for single factor analysis. > > > tapply(Temp, INDEX = Roof, FUN = median) > > This works fine, however if i try to do anything a bit more complex, such > as: > > > tapply(Temp, INDEX = Roof, FUN = kruskal.test) > > it gives the error - Error in length(g) : 'g' is missing > > What could be the problem? If you read ?kruskal.test, you'll notice its default function takes (at least) two arguments, the second being "g". Its description is: g: a vector or factor object giving the group for the corresponding elements of ‘x’. Ignored if ‘x’ is a list.
So you do not need tapply(): just call kruskal.test(Temp, Roof) The "theoretical" reason you cannot use tapply() is that it calls "FUN" separately for each subset of the data. kruskal.test() would never be passed the whole data set, which is needed to make a test of differences. Regards ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.