Hi Tom, > Now, try sorting and using a loop: > >> idx <- order(i) >> xs <- x[idx] >> is <- i[idx] >> res <- array(NA, 1e6) >> idx <- which(diff(is) > 0) >> startidx <- c(1, idx+1) >> endidx <- c(idx, length(xs)) >> f1 <- function(x, startidx, endidx, FUN = sum) { > + for (j in 1:length(res)) { > + res[j] <- FUN(x[startidx[j]:endidx[j]]) > + } > + res > + } >> unix.time(res1 <- f1(xs, startidx, endidx)) > [1] 6.86 0.00 7.04 NA NA
I wonder how much time the sorting, reordering and creation os startidx and endidx would add to this time? Either way, your code can nicely be used to quickly create the small integer factors I would need if the igroup functions get integrated. Thanks! > For the case of sum (or averages), you can vectorize this using > cumsum as > follows. This won't work for median or max. > >> f2 <- function(x, startidx, endidx) { > + cum <- cumsum(x) > + res <- cum[endidx] > + res[2:length(res)] <- res[2:length(res)] - cum[endidx[1:(length > (res) - > 1)]] > + res > + } >> unix.time(res2 <- f2(xs, startidx, endidx)) > [1] 0.20 0.00 0.21 NA NA Yes that is a quite fast way to handle "sums". > You can also use Luke Tierney's byte compiler > (http://www.stat.uiowa.edu/~luke/R/compiler/) to speed up the loop for > functions where you can't vectorize: > >> library(compiler) >> f3 <- cmpfun(f1) > Note: local functions used: FUN >> unix.time(res3 <- f3(xs, startidx, endidx)) > [1] 3.84 0.00 3.91 NA NA That looks interesting. Does it only work for specific operating systems and processors? I will give it a try. Thanks, Kevin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel