Dear Davis, Dennis and John, I am thankyou that you replied. I'll take care of it in future.
Eliza > Date: Mon, 8 Sep 2014 17:36:30 -0700 > Subject: Re: [R] splitting data > From: djmu...@gmail.com > To: eliza_bo...@hotmail.com > > Hi Eliza: > > Here are a few potential solutions. Given that you have 100 years of > monthly data, it's likely that a package such as dplyr or data.table > would be significantly faster than some of the alternatives offered to > date. I'm assuming the game is to generate the monthly sums for each > of A, B, C below. > > # Fake data set intended to replicate four years of data > # To keep it simple, I only use 30 days a month. > d <- data.frame(expand.grid(year = 1961:1964, month = seq(12), day = seq(30)), > A = rpois(1440, 10), B = rpois(1440, 15), C = rpois(1440, 25)) > > > # plyr package solution - colwise() applies the same function to each > # of the variables named within .() > > library(plyr) > ymsums <- ddply(d, .(year, month), colwise(sum, .(A, B, C))) > > > > # dplyr package solution using the new piping operator %>% from the > # magrittr package. (Think of a %>% b as: take the data in a and > # then call function b on it. This idea can be strung in sequence: > # the term on the left of %>% supplies the input data for the > # function call on the right.) > > library(dplyr) > # library(magrittr) > > ymsums2 <- d %>% group_by(year, month) %>% > summarise(Atot = sum(A), Btot = sum(B), Ctot = sum(C)) > > > > # data.table package solution > > library(data.table) > > dt <- data.table(d, key = c("year", "month")) > ymsums3 <- dt[, list(Atot = sum(A), Btot = sum(B), Ctot = sum(C)), > by = key(dt)] > > head(ymsums) > head(ymsums2) > head(ymsums3) > > dplyr was about 2.5 times faster than data.table and almost 30 times > faster than plyr for this example. To be honest, though, I don't think > I used the most efficient code for either of dplyr or data.table, so > the relative timings may be somewhat misleading. OTOH, for this 1440 > line fake data set, dplyr processed it in 0.1 sec. with the code I > used and data.table took 0.24 sec. If your data frame is 100 years in > length, it should be approximately 25 times the length of mine, so > we'd be talking about 2.5 sec with dplyr and somewhere between 3.5 - 5 > sec. with data.table, since the advantage of the way it sets keys > improves processing speed in a relative sense as the size of the data > set grows. That's not bad no matter which one you choose. > > BTW, it's possible to do it with reshape2 as follows: > > library(reshape2) > > # stack variables A-C, producing the long form > dm <- melt(d, id = c("year", "month", "day")) > > # reshape > drt <- dcast(dm, year + month ~ variable, fun.aggregate = sum, > value.var = "value") > head(drt) > > This is approximately 4 times faster than the plyr solution and about > 3 times slower than data.table. This is about as fast as you can get > it in reshape2. > > HTH, > Dennis > > PS: I agree with David about the HTML postings. You've been on this > list long enough to know what is expected. All it takes is a change or > two in the settings of your mailing client. I use gmail, and one > change of setting is all it took for me...five years ago, the one and > only time I was admonished to do so. > > On Mon, Sep 8, 2014 at 12:08 PM, eliza botto <eliza_bo...@hotmail.com> wrote: > > Dear R members, > > > > I have this data frame of 100 years in the following format > > > > year month day A B C D > > > > where A,B,C and D are item number sold each day. I am trying > > > > 1-split the data w.r.t the monthly values for each year > > > > 2-then, sum them up > > > > I am pasting here just a part of data to make it more clearer > > > > structure(list(year = c(1961, 1961, 1961, 1961, 1961, 1961, 1961, > > 1961, 1961, 1961, 1961, 1961), month = c(1, 1, 1, 1, 1, 1, 1, > > 1, 1, 1, 1, 1), day = 1:12, A = 1:12, B = 3:14, C = 6:17, D = 16:27), > > .Names = c("year", > > "month", "day", "A", "B", "C", "D"), row.names = c(NA, 12L), class = > > "data.frame") > > > > I initially tried to use "dcast" command but for no use. > > > > Your kind help is needed. > > > > Thanks in advance > > > > Eliza > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.