Thanks for that terrific post Bill - I learnt a lot. One question, should this line
d <- within(d, interaction(g1, g2, drop=TRUE)) be this ? d <- within(d, g1g2 <- interaction(g1, g2, drop=TRUE)) Michael On 23 September 2010 03:50, William Dunlap <wdun...@tibco.com> wrote: >> -----Original Message----- >> From: r-help-boun...@r-project.org >> [mailto:r-help-boun...@r-project.org] On Behalf Of Seth W Bigelow >> Sent: Tuesday, September 21, 2010 4:22 PM >> To: bill.venab...@csiro.au >> Cc: r-help@r-project.org >> Subject: Re: [R] Doing operations by grouping variable >> >> Aah, that is the sort of truly elegant solution I have been >> seeking. And >> it's wrapped up in a nice programming shortcut to boot (i.e., >> the within >> statement). I retract anything I may have said about tapply >> being clunky. >> >> Many thanks >> >> --Seth >> >> Dr. Seth W. Bigelow >> Biologist, USDA-FS Pacific Southwest Research Station >> 1731 Research Park Drive, Davis California >> >> <bill.venab...@csiro.au> >> 09/21/2010 03:15 PM >> >> To <sbige...@fs.fed.us> >> >> You left out the subscript. Why not just do >> >> d <- within(data.frame(group = rep(1:5, each = 5), >> variable = rnorm(25)), >> scaled <- variable/tapply(variable, group, max)[group]) > > This approach can be tricky when there is more than one > grouping variable. E.g., suppose we have grouping variables > g1 and g2: > > d <- data.frame(x=1:10, > g1=LETTERS[rep(11:12,each=5)], > g2=letters[rep(21:23,c(3,3,4))]) > > d > x g1 g2 > 1 1 K u > 2 2 K u > 3 3 K u > 4 4 K v > 5 5 K v > 6 6 L v > 7 7 L w > 8 8 L w > 9 9 L w > 10 10 L w > and we want to divide each x value by it max for each > g1*g2 group (6 possible groups, of which 4 are in the > data). > > You can extend Bill V.'s approach with > > with(d, x/tapply(x, list(g1,g2), FUN=max)[cbind(g1,g2)]) > [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 > [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 > That would fail if g1 and g2 were not factors but were > integer vectors. Try it with > > di <- data.frame(x=1:10, > g1=rep(11:12,each=5), > g2=rep(21:23,c(3,3,4))) > > with(di, x/tapply(x, list(g1,g2), FUN=max)[cbind(g1,g2)]) > Error in tapply(x, list(g1, g2), FUN = max)[cbind(g1, g2)] : > subscript out of bounds > > To avoid that problem you can call tapply with no FUN > to get the indices to subscript by > > with(d, x/tapply(x, list(g1,g2), FUN=max)[tapply(x, list(g1, g2))]) > [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 > [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 > > The misleadingly named ave() can avoid the need to do the > subscripting after tapply but has other problems > > with(d, x/ave(x, g1, g2, FUN=max)) > [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 > [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 > Warning messages: > 1: In FUN(X[[6L]], ...) : no non-missing arguments to max; returning > -Inf > 2: In FUN(X[[6L]], ...) : no non-missing arguments to max; returning > -Inf > It gives the right answer but it is calling FUN even for > the empty interaction groups. For some FUN's this would > abort the call, not just give a warning. In any case it > is a waste of time. > > In either case you can also use the interaction() function to > change the multiple grouping vectors into one: > > d <- within(d, interaction(g1, g2, drop=TRUE)) > > with(d, x/ave(x, g1g2, FUN=max)) > [1] 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 > [6] 1.0000000 0.7000000 0.8000000 0.9000000 1.0000000 > > with(d, x/tapply(x, g1g2, FUN=max)[g1g2]) > K.u K.u K.u K.v K.v L.v > 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 1.0000000 > L.w L.w L.w L.w > 0.7000000 0.8000000 0.9000000 1.0000000 > > with(d, x/tapply(x, g1g2, FUN=max)[tapply(x, g1g2)]) > K.u K.u K.u K.v K.v L.v > 0.3333333 0.6666667 1.0000000 0.8000000 1.0000000 1.0000000 > L.w L.w L.w L.w > 0.7000000 0.8000000 0.9000000 1.0000000 > The names are probably unwanted in the tapply cases; use unname > to get rid of them. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.