Although I know there is another message in this thread I am replying to this message to be able to include the whole discussion to this point.
Gregor mentioned the possibility of extending the compiled code for cumsum so that it would handle the matrix case. The work by Dirk Eddelbuettel and Romain Francois on developing the Rcpp package make it surprisingly easy to create compiled code for this task. I am copying the Rcpp-devel list on this in case one of the readers of that list has time to create a sample implementation before I can get to it. It's midterm season and I have to tackle the stack of blue books on my desk before doing fun things like writing code. On Fri, Oct 15, 2010 at 2:23 AM, Joshua Wiley <jwiley.ps...@gmail.com> wrote: > Hi, > > You might look at Reduce(). It seems faster. I converted the matrix > to a list in an incredibly sloppy way (which you should not emulate) > because I cannot think of the simple way. > >> probs <- t(matrix(rep(1:10000000), nrow=10)) # matrix with row-wise >> probabilites >> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs)); >> F[,1] <- probs[,1,drop=TRUE]; >> add <- function(x) {Reduce(`+`, x, accumulate = TRUE)} >> >> >> system.time(F.slow <- t(apply(probs, 1, cumsum))) > user system elapsed > 36.758 0.416 42.464 >> >> system.time(for (cc in 2:ncol(F)) { > + F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; > + }) > user system elapsed > 0.980 0.196 1.328 >> >> system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], >> probs[,6], probs[,7], probs[,8], probs[,9], probs[,10]))) > user system elapsed > 0.420 0.072 0.539 >> > > .539 seconds for 10 vectors each with 1 million elements does not seem > bad. For 30000 each, on my system it took .014 seconds, which for > 200,000 iterations, I estimated as: > >> (200000*.014)/60/60 > [1] 0.7777778 > > (and this is on a stone age system with a single core processor and > only 756MB of RAM) > > It should not be difficult to get the list back to a matrix. > > Cheers, > > Josh > > > > On Thu, Oct 14, 2010 at 11:51 PM, Gregor <mailingl...@gmx.at> wrote: >> Dear all, >> >> Maybe the "easiest" solution: Is there anything that speaks against >> generalizing >> cumsum from base to cope with matrices (as is done in matlab)? E.g.: >> >> "cumsum(Matrix, 1)" >> equivalent to >> "apply(Matrix, 1, cumsum)" >> >> The main advantage could be optimized code if the Matrix is extreme nonsquare >> (e.g. 100,000x10), but the summation is done over the short side (in this >> case 10). >> apply would practically yield a loop over 100,000 elements, and >> vectorization w.r.t. >> the long side (loop over 10 elements) provides considerable efficiency gains. >> >> Many regards, >> Gregor >> >> >> >> >> On Tue, 12 Oct 2010 10:24:53 +0200 >> Gregor <mailingl...@gmx.at> wrote: >> >>> Dear all, >>> >>> I am struggling with a (currently) cost-intensive problem: calculating the >>> (non-normalized) cumulative distribution function, given the >>> (non-normalized) >>> probabilities. something like: >>> >>> probs <- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites >>> F <- t(apply(probs, 1, cumsum)) #SLOOOW! >>> >>> One (already faster, but for sure not ideal) solution - thanks to Henrik >>> Bengtsson: >>> >>> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs)); >>> F[,1] <- probs[,1,drop=TRUE]; >>> for (cc in 2:ncol(F)) { >>> F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE]; >>> } >>> >>> In my case, probs is a (30,000 x 10) matrix, and i need to iterate this >>> step around >>> 200,000 times, so speed is crucial. I currently can make sure to have no >>> NAs, but >>> in order to extend matrixStats, this could be a nontrivial issue. >>> >>> Any ideas for speeding up this - probably routine - task? >>> >>> Thanks in advance, >>> Gregor >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > http://www.joshuawiley.com/ > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.