Re: [R] Fast ave for sorted data?

Charles C. Berry Sun, 15 Feb 2009 11:09:57 -0800

On Sun, 15 Feb 2009, Zhou Fang wrote:

Hi,
This is probably really obvious, by I can't seem to find anything on it.
Is there a fast version of ave for when the data is already sorted in termsof the factor, or if the breaks are already known?

If all you want are means, you can use rle() and colMeans() to goodeffect:

foo2 <-function (x,y)

{

        reps <- rle(x)$lengths
        lens <- rep(reps,reps)
        uniqLens <- unique(lens)
        for (i in uniqLens[ uniqLens != 1]){
                y[ lens == i] <-
                        rep( colMeans(matrix(y[ lens == i], nr=i)), each=i)
                }
        y

}

x <- sort( round( runif(100000, 0 , 1 ), 5) )
y <- sample(1000000,100000)
all.equal(ave(y,x),foo2(x,y))

[1] TRUE

system.time(foo2(x,y))

   user  system elapsed
  0.087   0.029   0.117

system.time(ave(y,x))

   user  system elapsed
  1.933   0.030   1.980

If, as in your example, a substantial fraction of the X's are unique, andif you want to generalize to more than means, then you can still gain alot by treating the unique and non-unique values separately like this:

foo <-function (x,y)

{

        reps <- rle(x)$lengths
        len.not.1 <- rep(reps,reps) != 1
        y[ len.not.1] <- ave( y[ len.not.1], x[ len.not.1 ])
        y

}

y <- sample(1000000,100000)
x <- sort( round( runif(100000, 0 , 2 ), 5) )
system.time(foo(x,y))

   user  system elapsed
  0.577   0.027   0.628

system.time(ave(y,x))

   user  system elapsed
  2.513   0.038   2.545

table(table(x))


    1     2     3     4     5     6
60526 15161  2578   318    28     1

And if neither of these is quite good enough, a line or two of C codeshould do the trick. See package 'inline'.



HTH,

Chuck

Basically, I have:
X = 0.1, 0.2, 0.32, 0.32, 0.4, 0.56, 0.56, 0.7...
Y = 223, 434, 343, 544, 231.... etc
of the same, admittedly large length.
Now note that some of the values of X are repeated. What I want to do is, forthose X that are repeated, take the corresponding values of Y and change themto the average for that particular X.
So, ave(Y,X) will work. But it's very slow, and certainly not suited to myproblem, where Y changes and X stays the same and I need to repeatedlyrecalculate the averaging of Y. Ave also does not take take advantage of thesorting of the data.
So, is there an alternative? (Presumeably avoiding loops.)

Thanks,

Zhou Fang

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fast ave for sorted data?

Reply via email to