I repeated your experiment:
> n <- 1000000
> x <- data.frame(A=sample(LETTERS,n,TRUE), 
> B=sample(letters[1:4],n,TRUE),C=sample(LETTERS[1:4], n, TRUE), data=runif(n))
> system.time(x.agg <- aggregate(x$data, list(x$A, x$B, x$C), mean))
   user  system elapsed
  1.824   0.212   2.038


Now I use my own data:
> length(levels(group))
[1] 7049
> length(levels(type))
[1] 11704
> y<-data.frame(group,type,signal)
> system.time(y.agg <- aggregate(y$signal, list(y$group,y$type), mean))
   (I killed it after 30 minutes)



> Date: Sat, 26 Jan 2008 19:55:51 -0500
> From: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: [R] Comparison of aggregate in R and group by in mysql
> CC: [EMAIL PROTECTED]
> 
> How large is your dataframe?  How much memory do you have on your
> system?  Are you paging?  Here is a test I ran with a data frame with
> 1,000,000 entries and it seems to be fast:
> 
> > n <- 1000000
> > x <- data.frame(A=sample(LETTERS,n,TRUE), B=sample(letters[1:4],n,TRUE),
> +     C=sample(LETTERS[1:4], n, TRUE), data=runif(n))
> > system.time(x.agg <- aggregate(x$data, list(x$A, x$B, x$C), mean))
>    user  system elapsed
>    2.65    0.34    3.00
> >
> 
> On Jan 26, 2008 6:45 PM, zhihuali <[EMAIL PROTECTED]> wrote:
> >
> > Hi, netters,
> >
> > First of all, thanks a lot for all the prompt replies to my earlier 
> > question about "merging" data frames in R.
> > Actually that's an equivalence to the "join" clause in mysql.
> >
> > Now I have another question. Suppose I have a data frame X with lots of 
> > columns/variables:
> > Name, Age,Group, Type, Salary.
> > I wanna do a subtotal of salaries:
> > aggregate(X$Salary, by=list(X$Group,X$Age,X$Type),Fun=mean)
> >
> > When the levels of Group and Type are huge, it took R forever to finish the 
> > aggregation.
> > And I used gc to find that the memory usage was big too.
> >
> > However, in mysql, it took seconds to finish a similar job:
> > select Group,Age,Type ,avg(Salary)  from X group by  Group,Age,Type
> >
> > Is it because mysql is superior in doing such kind of things? Or my R 
> > command is not efficient enough? Why did R have to consume huge memories to 
> > do the aggregation?
> >
> > Thanks again!
> >
> > Zhihua Li
> >
> > _________________________________________________________________
> > ÌìÁ¹ÁË£¬ÌíÒÂÁË£¬Ð͝ÁË£¬"Æß¼þ"ÁË
> > http://get.live.cn
> >        [[alternative HTML version deleted]]
> >
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> 
> 
> 
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
> 
> What is the problem you are trying to solve?

_________________________________________________________________
MSNÊ¥µ®ÀñÎï»ðÈȵdz¡£¬Ãâ·Ñ·¢·ÅÖУ¬¿ìÀ´ÁìÈ¡°É£¡
http://im.live.cn/emoticons/?ID=18
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to