Take a look at
?split (and unsplit)
eg:
Dur <- rnorm(100)
Attr1=rep(c("A","B"),each=50)
Attr2=rep(c("A","B"),times=50)
ap.dat <-data.frame(Attr1,Attr2,Dur)
split.fact <- paste(ap.dat$Attr1,ap.dat$Attr2)
ap.list <-split(ap.dat,split.fact)
ap.mean <-lapply(ap.list,function(x){
x$meanDur=rep(mean(x$Dur),dim(x)[1])
return(x)
})
ap.dat.fast <- unsplit(ap.mean,split.fact)
system.time on 1000 replicates gives :
> system.time(replicate(1000,{
+ split.fact <- paste(ap.dat$Attr1,ap.dat$Attr2)
+ ap.list <-split(ap.dat,split.fact)
+ ap.mean <-lapply(ap.list,functi .... [TRUNCATED]
user system elapsed
4.88 0.00 4.88
> source(.trPaths[5], echo=TRUE, max.deparse.length=150)
> system.time(replicate(1000,{
+ avgDur <- aggregate(ap.dat[["Dur"]], by = list(ap.dat[["Attr1"]],
+ ap.dat[["Attr2"]]), FUN="mean")
+ meanDur <- sapp .... [TRUNCATED]
user system elapsed
58.00 0.11 58.13
>
It should be a tenfold faster.
Cheers
Joris
On Tue, Jun 1, 2010 at 4:48 PM, Stella Pachidi <[email protected]>wrote:
> Dear R experts,
>
> I would really appreciate if you had an idea on how to use more
> efficiently the aggregate method:
>
> More specifically, I would like to calculate the mean of certain
> values on a data frame, grouped by various attributes, and then
> create a new column in the data frame that will have the corresponding
> mean for every row. I attach part of my code:
>
> matchMean <- function(ind,dataTable,aggrTable)
> {
> index <- which((aggrTable[,1]==dataTable[["Attr1"]][ind]) &
> (aggrTable[,2]==dataTable[["Attr2"]][ind]))
> as.numeric(aggrTable[index,3])
> }
>
> avgDur <- aggregate(ap.dat[["Dur"]], by = list(ap.dat[["Attr1"]],
> ap.dat[["Attr2"]]), FUN="mean")
> meanDur <- sapply((1:length(ap.dat[,1])), FUN=matchMean, ap.dat, avgDur)
> ap.dat <- cbind (ap.dat, meanDur)
>
> As I deal with very large dataset, it takes long time to run my
> matching function, so if you had an idea on how to automate more this
> matching process I would be really grateful.
>
> Thank you very much in advance!
>
> Kind regards,
> Stella
>
>
>
> --
> Stella Pachidi
> Master in Business Informatics student
> Utrecht University
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joris Meys
Statistical Consultant
Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control
Coupure Links 653
B-9000 Gent
tel : +32 9 264 59 87
[email protected]
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.