On 2013-03-11 13:52, Marius Hofert wrote:
Dear expeRts,
The question is rather simple: Why does aggregate (or similarly tapply()) not
keep the order of the grouping variable(s)?
Here is an example:
x <- data.frame(group = rep(LETTERS[1:2], each=10),
year = rep(rep(2001:2005, each=2), 2),
value = rep(1:10, each=2))
## => sorted according to group, then year
aggregate(value ~ group + year, data=x, FUN=function(z) z[1])
## => sorted according to year, then group
I rather expected this to be the default:
aggregate(value ~ year + group, data=x, FUN=function(z) z[1])[,c(2,1,3)]
## => same order as input (grouping) variables
Same with tapply:
as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=function(z)
z[1])))
Cheers,
Marius
I'm no expeRt, but suppose that we change the setup slightly:
xx <- x[sample(nrow(x)), ]
Now what would you like
aggregate(value ~ group + year, data=xx, FUN=function(z) z[1])
to return?
Personally, I prefer to have R return the same thing regardless
of how the input dataframe is sorted, i.e. the result should
depend only on the formula. You just have to know that the order
is to have the first factor vary most rapidly, then the next, etc.
I think that's documented somewhere, but I don't know where.
Peter Ehlers
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.