On 28 January 2012 at 16:20, Hans W Borchers wrote: | R. Michael Weylandt <michael.weylandt <at> gmail.com> writes: | > | > I'd write your own diff() that eliminates the method dispatch and | > argument checking that diff -> diff.default does. | > | > x[-1] - x[-len(x)] # is all you really need. | > (# you could also try something like c(x[-1], NA) - x which may be | > marginally faster as it only subsets x once but you should profile to | > find out) | > | > is probably about as fast as you can get within pure R code (the | > function overhead will add a little bit of time as well, so if speed | > is truly the only thing that matters, best not to use it. If you wanna | > go for even more speed, you'll have to go to compiled code; I'd | > suggest inline+Rcpp as the easiest way to do so. That could get it | > down to a single pass through the vector in pure C (or nice C++) which | > seems to be a lower bound for speed. | > | > Michael | | | Python has become astonishingly fast during the last years. On an iMAc with | 3.06 GHz I can see the following timings (though I do feel a bit suspicious | about the timings Python reports): | | Python 0.040 s Version 2.6.1, 1e7 integer elements | Matlab 0.095 s Matlab's diff function (Version R2011b) | Matlab 0.315 s Matlab using x(2:N)-x(1:(N-1)) | R 2.14.1 0.375 s R's diff() function | R 0.365 s R using x[-1]-x[-N] | R 0.270 s R using c(x[-1],NA)-x) | R+Fortran 0.180 s R function calling .Fortran | R+C 0.180 s R function calling .C | | where---as an example---the C code looks like: | | void diff(int *n, int *x, int *d) | { for (long i=0; i<*n-2; i++) d[i] = x[i+1] - x[i]; }
We looked at a number of these operations in the context of the Rcpp benchmark for vector convolution. If you really really care about the last bit of speed you do a little better using pointer arithmetic rather than [] indexing. Also, R always checks for NA which is costly. | There appears to be a factor of 4 between R+compiled code and Python code. | It is also interesting to see that in Matlab 'diff' is considerably faster | than differencing vectors, while in R it is slower. | | P. S.: To make the comparison fair I have used the following Python call: | | python -m timeit -n 1 -r 1 | -s 'import numpy' | -s 'arr = numpy.random.randint(0, 1000, (10000000,1)).astype("int32")' | 'diff = arr[1:] - arr[:-1]' | | i.e., used 32-bit integers and included the indexing in the loop. We should be able to close the gap from 40ms for Python to 180ms for R + C. I suspect there is some room left. Can you post your codes ? Dirk | > On Fri, Jan 27, 2012 at 7:15 PM, Kevin Ummel <kevinummel <at> gmail.com> wrote: | > > Hi everyone, | > > | > > Speed is the key here. | > > | > > I need to find the difference between a vector and its one-period lag | > > (i.e. the difference between each value and the subsequent one in the | > > vector). Let's say the vector contains 10 million random integers | > > between 0 and 1,000. The solution vector will have 9,999,999 values, | > > since their is no lag for the 1st observation. | > > | > > In R we have: | > > | > > #Set up input vector | > > x = runif(n=10e6, min=0, max=1000) | > > x = round(x) | > > | > > #Find one-period difference | > > y = diff(x) | > > | > > Question is: How can I get the 'diff(x)' part as fast as absolutely | > > possible? I queried some colleagues who work with other languages, and | > > they provided equivalent solutions in Python and Clojure that, on their | > > machines, appear to be potentially much faster | > > (I've put the code below in case anyone is interested). | > > However, they mentioned that the overhead in passing the data between | > > languages could kill any improvements. I don't have much experience | > > integrating other languages, so I'm hoping the community has some ideas | > > about how to approach this particular problem... | > > | > > Many thanks, | > > Kevin | > > | > > In iPython: | > > | > > In [3]: import numpy as np | > > In [4]: arr = np.random.randint(0, 1000, (10000000,1)).astype("int16") | > > In [5]: arr1 = arr[1:].view() | > > In [6]: timeit arr2 = arr1 - arr[:-1] | > > 10 loops, best of 3: 20.1 ms per loop | > > | > > In Clojure: | > > | > > (defn subtract-lag | > > [n] | > > (let [v (take n (repeatedly rand))] | > > (time (dorun (map - v (cons 0 v)))))) | > | | ______________________________________________ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- "Outside of a dog, a book is a man's best friend. Inside of a dog, it is too dark to read." -- Groucho Marx ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.