Re: [R] Need very fast application of 'diff' - ideas?

Kevin Ummel Sat, 28 Jan 2012 23:54:25 -0800

Thanks. I've played around with pure R solutions. The fastest re-write of diff 
(for the 1 lag case) I can seem to find is this:


diff2 = function(x) {
  y = c(x,NA) - c(NA,x)
  y[2:length(x)]
}

#Compiling via 'cmpfun' doesn't seem to help (or hurt):
require(compiler)
diff2 = cmpfun(diff2)

But that only gets ~10% improvement over default 'diff' on my machine. Still 
too slow for my particular application.

I'm inclined towards Michael's suggestion of inline+Rcpp (or some other use of 
C under the hood).

Could someone show me how to go about doing that?

Thanks!
Kevin

On Jan 28, 2012, at 9:14 AM, Peter Langfelder wrote:

> ehm... this doesn't take very many ideas.
> 
> 
> x = runif(n=10e6, min=0, max=1000)
> x = round(x)
> 
> system.time( {
>  y = x[-1] - x[-length(x)]
> })
> 
> I get about 0.5 seconds on my old laptop.
> 
> HTH
> 
> Peter
> 
> 
> On Fri, Jan 27, 2012 at 4:15 PM, Kevin Ummel <kevinum...@gmail.com> wrote:
>> Hi everyone,
>> 
>> Speed is the key here.
>> 
>> I need to find the difference between a vector and its one-period lag (i.e. 
>> the difference between each value and the subsequent one in the vector). 
>> Let's say the vector contains 10 million random integers between 0 and 
>> 1,000. The solution vector will have 9,999,999 values, since their is no lag 
>> for the 1st observation.
>> 
>> In R we have:
>> 
>> #Set up input vector
>> x = runif(n=10e6, min=0, max=1000)
>> x = round(x)
>> 
>> #Find one-period difference
>> y = diff(x)
>> 
>> Question is: How can I get the 'diff(x)' part as fast as absolutely 
>> possible? I queried some colleagues who work with other languages, and they 
>> provided equivalent solutions in Python and Clojure that, on their machines, 
>> appear to be potentially much faster (I've put the code below in case anyone 
>> is interested). However, they mentioned that the overhead in passing the 
>> data between languages could kill any improvements. I don't have much 
>> experience integrating other languages, so I'm hoping the community has some 
>> ideas about how to approach this particular problem...
>> 
>> Many thanks,
>> Kevin
>> 
>> In iPython:
>> 
>> In [3]: import numpy as np
>> In [4]: arr = np.random.randint(0, 1000, (10000000,1)).astype("int16")
>> In [5]: arr1 = arr[1:].view()
>> In [6]: timeit arr2 = arr1 - arr[:-1]
>> 10 loops, best of 3: 20.1 ms per loop
>> 
>> In Clojure:
>> 
>> (defn subtract-lag
>>  [n]
>>  (let [v (take n (repeatedly rand))]
>>    (time (dorun (map - v (cons 0 v))))))
>> 
>> 
>> 
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need very fast application of 'diff' - ideas?

Reply via email to