Thanks. I've played around with pure R solutions. The fastest re-write of diff (for the 1 lag case) I can seem to find is this:
diff2 = function(x) { y = c(x,NA) - c(NA,x) y[2:length(x)] } #Compiling via 'cmpfun' doesn't seem to help (or hurt): require(compiler) diff2 = cmpfun(diff2) But that only gets ~10% improvement over default 'diff' on my machine. Still too slow for my particular application. I'm inclined towards Michael's suggestion of inline+Rcpp (or some other use of C under the hood). Could someone show me how to go about doing that? Thanks! Kevin On Jan 28, 2012, at 9:14 AM, Peter Langfelder wrote: > ehm... this doesn't take very many ideas. > > > x = runif(n=10e6, min=0, max=1000) > x = round(x) > > system.time( { > y = x[-1] - x[-length(x)] > }) > > I get about 0.5 seconds on my old laptop. > > HTH > > Peter > > > On Fri, Jan 27, 2012 at 4:15 PM, Kevin Ummel <kevinum...@gmail.com> wrote: >> Hi everyone, >> >> Speed is the key here. >> >> I need to find the difference between a vector and its one-period lag (i.e. >> the difference between each value and the subsequent one in the vector). >> Let's say the vector contains 10 million random integers between 0 and >> 1,000. The solution vector will have 9,999,999 values, since their is no lag >> for the 1st observation. >> >> In R we have: >> >> #Set up input vector >> x = runif(n=10e6, min=0, max=1000) >> x = round(x) >> >> #Find one-period difference >> y = diff(x) >> >> Question is: How can I get the 'diff(x)' part as fast as absolutely >> possible? I queried some colleagues who work with other languages, and they >> provided equivalent solutions in Python and Clojure that, on their machines, >> appear to be potentially much faster (I've put the code below in case anyone >> is interested). However, they mentioned that the overhead in passing the >> data between languages could kill any improvements. I don't have much >> experience integrating other languages, so I'm hoping the community has some >> ideas about how to approach this particular problem... >> >> Many thanks, >> Kevin >> >> In iPython: >> >> In [3]: import numpy as np >> In [4]: arr = np.random.randint(0, 1000, (10000000,1)).astype("int16") >> In [5]: arr1 = arr[1:].view() >> In [6]: timeit arr2 = arr1 - arr[:-1] >> 10 loops, best of 3: 20.1 ms per loop >> >> In Clojure: >> >> (defn subtract-lag >> [n] >> (let [v (take n (repeatedly rand))] >> (time (dorun (map - v (cons 0 v)))))) >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.