Re: [R] summarizing a data frame i.e. count -> group by

David Winsemius Sun, 23 Oct 2011 11:03:05 -0700


On Oct 23, 2011, at 1:29 PM, Giovanni Azua wrote:

Hello,

This is one problem at the time :)

I have a data frame df that looks like this:

> df <-read.table(textConnection(" time partitioning_mode workloadruntime

+ 1     1          sharding    query     607
+ 2     1          sharding    query      85
+ 3     1          sharding    query      52
+ 4     1          sharding    query      79
+ 5     1          sharding    query      77
+ 6     1          sharding    query      67
+ 7     1          sharding    query      98
+ 8     1          sharding  refresh    2932
+ 9     1          sharding  refresh    2870
+ 10    1          sharding  refresh    2877
+ 11    1          sharding  refresh    2868
+ 12    1       replication    query    2891
+ 13    1       replication    query    2907
+ 14    1       replication    query    2922
+ 15    1       replication    query    2937"))
>

> df$throughput <- ave(df$time, list(df$time, df$partitioning_mode),FUN=length)

> df
   time partitioning_mode workload runtime throughput
1     1          sharding    query     607         11
2     1          sharding    query      85         11
3     1          sharding    query      52         11
4     1          sharding    query      79         11
5     1          sharding    query      77         11
6     1          sharding    query      67         11
7     1          sharding    query      98         11
8     1          sharding  refresh    2932         11
9     1          sharding  refresh    2870         11
10    1          sharding  refresh    2877         11
11    1          sharding  refresh    2868         11
12    1       replication    query    2891          4
13    1       replication    query    2907          4
14    1       replication    query    2922          4
15    1       replication    query    2937          4

and if I could use SQL ... omg! I really wish I could! I would doexactly this:

You can of, course use package sqldf, which would undoubtedly be goodpractice for me, but this seemed like a typical situation for using'ave'. You do need to use the FUN= construction in 'ave' because thatargument appears after the triple dots in the argument list.

insert into throughput
 select time, partitioning_mode, count(*)
 from data.frame
 group by time, partitioning_mode
My attempted R versions are wrong and produce very cryptic errormessage:
throughput <- aggregate(x=df[,c("time", "partitioning_mode")],by=list(df$time,df$partitioning_mode), count)
Error in `[.default`(df2, u_id, , drop = FALSE) :
 incorrect number of dimensions
throughput <- aggregate(x=df, by=list(df$time,df$partitioning_mode), count)
Error in `[.default`(df2, u_id, , drop = FALSE) :
 incorrect number of dimensions
throughput <- tapply(X=df$time, INDEX=list(df$time,df$partitioning), FUN=count)
I cant comprehend what comes out from this one ... :(

and I thought C++ template errors were the most cryptic ;P

Many many thanks in advance,
Best regards,
Giovanni
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] summarizing a data frame i.e. count -> group by

Reply via email to