Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-03 Thread Charles C. Berry
On Sat, 3 Sep 2016, Bert Gunter wrote: Chuck et. al.: As I said previously, my intuition about the relative efficiency of tapply() and duplicated() in the context of this thread was wrong. My `intuition' was wrong, too. But tapply() uses split() which runs quite fast. So not a big surprise,

Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-03 Thread Bert Gunter
Chuck et. al.: As I said previously, my intuition about the relative efficiency of tapply() and duplicated() in the context of this thread was wrong. But I wondered exactly how and to what extent. So I've fooled around a bit more and think I understand. Using the example I gave, the key is to repl

Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-02 Thread Bert Gunter
Chuck: I think this is quite clever. But note that the which() is unnecessary: logical indicing suffices, e.g. df[!duplicated(df[,c("f","g")],fromLast = TRUE),] I thought that your approach would be faster because it moves comparisons from the tapply() to C code. But I was wrong. e.g. for 1e6 ro

Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-02 Thread Charles C. Berry
On Fri, 2 Sep 2016, Bert Gunter wrote: [snip] The "trick" is to use tapply() to select the necessary row indices of your data frame and forget about all the do.call and rbind stuff. e.g. I agree the way to go is "select the necessary row indices" but I get there a different way. See below.

Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-02 Thread Jun Shen
Hi Bert, This is the best method I have seen this year! do.call, rbind has just gone to museum :) It took ~30 second to get the results. You deserve a medal Jun On Fri, Sep 2, 2016 at 1:51 PM, Bert Gunter wrote: > This is the sort of thing that dplyr or the data.table packages can > proba

Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-02 Thread ruipbarradas
Hello, Try ?aggregate, it's probably faster. With a made up data.frame, since you haven't provided us with a dataset, simout.s1 <- data.frame(SID = rep(LETTERS[1:2], 10), DOSENO = rep(letters[1:4], each = 5), value = rnorm(20)) res2 <- aggregate(simout.s1$value,

Re: [R] Improve code efficient with do.call, rbind and split contruction

2016-09-02 Thread Bert Gunter
This is the sort of thing that dplyr or the data.table packages can probably do elegantly and efficiently. So you might consider looking at them. But as I use neither, let me suggest a base R solution. As you supplied no data for a reproducible example, I'll make up my own and hopefully I have unde

[R] Improve code efficient with do.call, rbind and split contruction

2016-09-02 Thread Jun Shen
Dear list, I have the following line of code to extract the last line of the split data and put them back together. do.call(rbind,lapply(split(simout.s1,simout.s1[c('SID','DOSENO')]),function(x)x[nrow(x),])) the problem is when have a huge dataset, it takes too long to run. (actually it's > 3 h