Hi everyone,

Speed is the key here.

I need to find the difference between a vector and its one-period lag (i.e. the 
difference between each value and the subsequent one in the vector). Let's say 
the vector contains 10 million random integers between 0 and 1,000. The 
solution vector will have 9,999,999 values, since their is no lag for the 1st 
observation.

In R we have:

#Set up input vector
x = runif(n=10e6, min=0, max=1000)
x = round(x)

#Find one-period difference
y = diff(x)

Question is: How can I get the 'diff(x)' part as fast as absolutely possible? I 
queried some colleagues who work with other languages, and they provided 
equivalent solutions in Python and Clojure that, on their machines, appear to 
be potentially much faster (I've put the code below in case anyone is 
interested). However, they mentioned that the overhead in passing the data 
between languages could kill any improvements. I don't have much experience 
integrating other languages, so I'm hoping the community has some ideas about 
how to approach this particular problem...

Many thanks,
Kevin

In iPython:

In [3]: import numpy as np
In [4]: arr = np.random.randint(0, 1000, (10000000,1)).astype("int16")
In [5]: arr1 = arr[1:].view()
In [6]: timeit arr2 = arr1 - arr[:-1]
10 loops, best of 3: 20.1 ms per loop

In Clojure:

(defn subtract-lag
  [n]
  (let [v (take n (repeatedly rand))]
    (time (dorun (map - v (cons 0 v))))))





        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to