Sorry, guys. I'm not active on the listserve, so my last post was held by the moderator until after Dirk's solution was posted.
Excellent stuff. thanks, kevin On Jan 29, 2012, at 8:37 AM, R. Michael Weylandt wrote: > Have you not followed your own thread? Dirk is Mr. Rcpp himself and he > gives an implementation that gives you 25x improvement here as well as > tips for getting even more out of it: > > http://tolstoy.newcastle.edu.au/R/e17/help/12/01/2471.html > > Michael > > On Sat, Jan 28, 2012 at 12:28 PM, Kevin Ummel <kevinum...@gmail.com> wrote: >> Thanks. I've played around with pure R solutions. The fastest re-write of >> diff (for the 1 lag case) I can seem to find is this: >> >> diff2 = function(x) { >> y = c(x,NA) - c(NA,x) >> y[2:length(x)] >> } >> >> #Compiling via 'cmpfun' doesn't seem to help (or hurt): >> require(compiler) >> diff2 = cmpfun(diff2) >> >> But that only gets ~10% improvement over default 'diff' on my machine. Still >> too slow for my particular application. >> >> I'm inclined towards Michael's suggestion of inline+Rcpp (or some other use >> of C under the hood). >> >> Could someone show me how to go about doing that? >> >> Thanks! >> Kevin >> >> On Jan 28, 2012, at 9:14 AM, Peter Langfelder wrote: >> >>> ehm... this doesn't take very many ideas. >>> >>> >>> x = runif(n=10e6, min=0, max=1000) >>> x = round(x) >>> >>> system.time( { >>> y = x[-1] - x[-length(x)] >>> }) >>> >>> I get about 0.5 seconds on my old laptop. >>> >>> HTH >>> >>> Peter >>> >>> >>> On Fri, Jan 27, 2012 at 4:15 PM, Kevin Ummel <kevinum...@gmail.com> wrote: >>>> Hi everyone, >>>> >>>> Speed is the key here. >>>> >>>> I need to find the difference between a vector and its one-period lag >>>> (i.e. the difference between each value and the subsequent one in the >>>> vector). Let's say the vector contains 10 million random integers between >>>> 0 and 1,000. The solution vector will have 9,999,999 values, since their >>>> is no lag for the 1st observation. >>>> >>>> In R we have: >>>> >>>> #Set up input vector >>>> x = runif(n=10e6, min=0, max=1000) >>>> x = round(x) >>>> >>>> #Find one-period difference >>>> y = diff(x) >>>> >>>> Question is: How can I get the 'diff(x)' part as fast as absolutely >>>> possible? I queried some colleagues who work with other languages, and >>>> they provided equivalent solutions in Python and Clojure that, on their >>>> machines, appear to be potentially much faster (I've put the code below in >>>> case anyone is interested). However, they mentioned that the overhead in >>>> passing the data between languages could kill any improvements. I don't >>>> have much experience integrating other languages, so I'm hoping the >>>> community has some ideas about how to approach this particular problem... >>>> >>>> Many thanks, >>>> Kevin >>>> >>>> In iPython: >>>> >>>> In [3]: import numpy as np >>>> In [4]: arr = np.random.randint(0, 1000, (10000000,1)).astype("int16") >>>> In [5]: arr1 = arr[1:].view() >>>> In [6]: timeit arr2 = arr1 - arr[:-1] >>>> 10 loops, best of 3: 20.1 ms per loop >>>> >>>> In Clojure: >>>> >>>> (defn subtract-lag >>>> [n] >>>> (let [v (take n (repeatedly rand))] >>>> (time (dorun (map - v (cons 0 v)))))) >>>> >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.