Dénes: A fair point! The only reason I have is ignorance -- I have not used data.table. I am not surprised that it and perhaps other packages (dplyr maybe?) can do things in a reasonable way very efficiently. The only problem is that it requires us to learn yet another package/paradigm. There may also be issues with ts flexibility compared to base R data structures, but, again, I must plead ignorance here.
It is interesting that, mod the unsplit reconstruction of the original vectors, Chuck's base R solution is as efficient as data.table's. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Wed, Sep 16, 2015 at 4:42 PM, Dénes Tóth <toth.de...@ttk.mta.hu> wrote: > > > On 09/16/2015 04:41 PM, Bert Gunter wrote: >> >> Yes! Chuck's use of mapply is exactly the split/combine strategy I was >> looking for. In retrospect, exactly how one should think about it. >> Many thanks to all for a constructive discussion . >> >> -- Bert >> >> >> Bert Gunter >> >>>>> >>>>> Use mapply like this on large problems: >>>>> >>>>> unsplit( >>>>> mapply( >>>>> function(x,z) eval( x, list( y=z )), >>>>> expression( A=y*2, B=y+3, C=sqrt(y) ), >>>>> split( dat$Flow, dat$ASB ), >>>>> SIMPLIFY=FALSE), >>>>> dat$ASB) >>>>> >>>>> Chuck >>>>> > > > Is there any reason not to use data.table for this purpose, especially if > efficiency is of concern? > > --- > > # load data.table and microbenchmark > library(data.table) > library(microbenchmark) > # > # prepare data > DF <- data.frame( > ASB = rep_len(factor(LETTERS[1:3]), 3e5), > Flow = rnorm(3e5)^2) > DT <- as.data.table(DF) > DT[, ASB := as.character(ASB)] > # > # define functions > # > # Chuck's version > fnSplit <- function(dat) { > unsplit( > mapply( > function(x,z) eval( x, list( y=z )), > expression( A=y*2, B=y+3, C=sqrt(y) ), > split( dat$Flow, dat$ASB ), > SIMPLIFY=FALSE), > dat$ASB) > } > # > # data.table-way (IMHO, much easier to read) > fnDataTable <- function(dat) { > dat[, > result := > if (.BY == "A") { > 2 * Flow > } else if (.BY == "B") { > 3 + Flow > } else if (.BY == "C") { > sqrt(Flow) > }, > by = ASB] > } > # > # benchmark > # > microbenchmark(fnSplit(DF), fnDataTable(DT)) > identical(fnSplit(DF), fnDataTable(DT)[, result]) > > --- > > Actually, in Chuck's version the unsplit() part is slow. If the order is not > of concern (e.g., DF is reordered before calling fnSplit), fnSplit is > comparable to the DT-version. > > > Denes ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.