Funny, I couldn't run your code using R 2.10.1 (aggregate required a list). This said, take a look at the function ave() :
> X <- rep(1:4) > Y <- rep(letters[1:2],each=2) > Z <- data.frame(X,Y) > system.time(replicate(1000,{ + A <- aggregate(Z$X, by=list(Y=Z$Y), FUN=mean) + M <- merge(Z,A,by="Y")[,3] + Result <- X - M + })) user system elapsed 3.57 0.01 3.58 > system.time(replicate(1000,{ + Result <- Z$X - ave(Z$X,Z$Y) + })) user system elapsed 0.25 0.00 0.25 > Cheers Joris On Thu, Jun 17, 2010 at 9:22 AM, Ben Cocker <b.coc...@ucl.ac.uk> wrote: > Hi all, > > This is my first ever post, so forgive me and let me know if my > etiquette is less than that required. > > I am searching for a faster way of subracting group means within a > data frame than the solution I've found so far, using AGGREGATE and > MERGE. > > I'll flesh my question out using a trivial example: I have a data > frame Z with two columns - one X of values and one Y of labels: > >> Z > X Y > 1 1 4 > 2 2 4 > 3 3 5 > 4 4 5 > > I want to take the group means (for the two groups Y=4 and Y=5) and > subtract them from X resulting in the vector Result = t(-0.5 0.5 -0.5 > 0.5). I have found a (slow) way of achieving this, using the > AGGREGATE function to get the group means and then MERGE to construct > an appropriate vector of these values, M: > >> A <- aggregate(Z$X, by=Z$Y, FUN=mean) >> A > Y X > 1 4 1.5 > 2 5 3.5 > >> M <- merge(Z,A,by="Y")[,3] >> M > [1] 1.5 1.5 3.5 3.5 > >> Result <- X - M >> Result > X > 1 -0.5 > 2 0.5 > 3 -0.5 > 4 0.5 > > My problem: for lots of records, while AGGREGATE is very fast, MERGE > is very slow - in real life I need to call this routine many times > over a very large dataset. Could anyone help me find a faster way of > achieving the same goal? > > Many thanks, > > Ben Cocker > MSc Statistics at UCL, London, UK > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.