Re: [R] Need very fast application of 'diff' - ideas?

R. Michael Weylandt Sun, 29 Jan 2012 07:39:48 -0800

Have you not followed your own thread? Dirk is Mr. Rcpp himself and he
gives an implementation that gives you 25x improvement here as well as
tips for getting even more out of it:


http://tolstoy.newcastle.edu.au/R/e17/help/12/01/2471.html

Michael

On Sat, Jan 28, 2012 at 12:28 PM, Kevin Ummel <kevinum...@gmail.com> wrote:
> Thanks. I've played around with pure R solutions. The fastest re-write of 
> diff (for the 1 lag case) I can seem to find is this:
>
> diff2 = function(x) {
>  y = c(x,NA) - c(NA,x)
>  y[2:length(x)]
> }
>
> #Compiling via 'cmpfun' doesn't seem to help (or hurt):
> require(compiler)
> diff2 = cmpfun(diff2)
>
> But that only gets ~10% improvement over default 'diff' on my machine. Still 
> too slow for my particular application.
>
> I'm inclined towards Michael's suggestion of inline+Rcpp (or some other use 
> of C under the hood).
>
> Could someone show me how to go about doing that?
>
> Thanks!
> Kevin
>
> On Jan 28, 2012, at 9:14 AM, Peter Langfelder wrote:
>
>> ehm... this doesn't take very many ideas.
>>
>>
>> x = runif(n=10e6, min=0, max=1000)
>> x = round(x)
>>
>> system.time( {
>>  y = x[-1] - x[-length(x)]
>> })
>>
>> I get about 0.5 seconds on my old laptop.
>>
>> HTH
>>
>> Peter
>>
>>
>> On Fri, Jan 27, 2012 at 4:15 PM, Kevin Ummel <kevinum...@gmail.com> wrote:
>>> Hi everyone,
>>>
>>> Speed is the key here.
>>>
>>> I need to find the difference between a vector and its one-period lag (i.e. 
>>> the difference between each value and the subsequent one in the vector). 
>>> Let's say the vector contains 10 million random integers between 0 and 
>>> 1,000. The solution vector will have 9,999,999 values, since their is no 
>>> lag for the 1st observation.
>>>
>>> In R we have:
>>>
>>> #Set up input vector
>>> x = runif(n=10e6, min=0, max=1000)
>>> x = round(x)
>>>
>>> #Find one-period difference
>>> y = diff(x)
>>>
>>> Question is: How can I get the 'diff(x)' part as fast as absolutely 
>>> possible? I queried some colleagues who work with other languages, and they 
>>> provided equivalent solutions in Python and Clojure that, on their 
>>> machines, appear to be potentially much faster (I've put the code below in 
>>> case anyone is interested). However, they mentioned that the overhead in 
>>> passing the data between languages could kill any improvements. I don't 
>>> have much experience integrating other languages, so I'm hoping the 
>>> community has some ideas about how to approach this particular problem...
>>>
>>> Many thanks,
>>> Kevin
>>>
>>> In iPython:
>>>
>>> In [3]: import numpy as np
>>> In [4]: arr = np.random.randint(0, 1000, (10000000,1)).astype("int16")
>>> In [5]: arr1 = arr[1:].view()
>>> In [6]: timeit arr2 = arr1 - arr[:-1]
>>> 10 loops, best of 3: 20.1 ms per loop
>>>
>>> In Clojure:
>>>
>>> (defn subtract-lag
>>>  [n]
>>>  (let [v (take n (repeatedly rand))]
>>>    (time (dorun (map - v (cons 0 v))))))
>>>
>>>
>>>
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need very fast application of 'diff' - ideas?

Reply via email to