On Sun, 15 Feb 2009, Zhou Fang wrote:

Hi,

This is probably really obvious, by I can't seem to find anything on it.

Is there a fast version of ave for when the data is already sorted in terms of the factor, or if the breaks are already known?


If all you want are means, you can use rle() and colMeans() to good effect:

foo2 <- function (x,y)
{

        reps <- rle(x)$lengths
        lens <- rep(reps,reps)
        uniqLens <- unique(lens)
        for (i in uniqLens[ uniqLens != 1]){
                y[ lens == i] <-
                        rep( colMeans(matrix(y[ lens == i], nr=i)), each=i)
                }
        y

}

x <- sort( round( runif(100000, 0 , 1 ), 5) )
y <- sample(1000000,100000)
all.equal(ave(y,x),foo2(x,y))
[1] TRUE
system.time(foo2(x,y))
   user  system elapsed
  0.087   0.029   0.117
system.time(ave(y,x))
   user  system elapsed
  1.933   0.030   1.980



If, as in your example, a substantial fraction of the X's are unique, and if you want to generalize to more than means, then you can still gain a lot by treating the unique and non-unique values separately like this:

foo <- function (x,y)
{

        reps <- rle(x)$lengths
        len.not.1 <- rep(reps,reps) != 1
        y[ len.not.1] <- ave( y[ len.not.1], x[ len.not.1 ])
        y

}

y <- sample(1000000,100000)
x <- sort( round( runif(100000, 0 , 2 ), 5) )
system.time(foo(x,y))
   user  system elapsed
  0.577   0.027   0.628
system.time(ave(y,x))
   user  system elapsed
  2.513   0.038   2.545
table(table(x))

    1     2     3     4     5     6
60526 15161  2578   318    28     1

And if neither of these is quite good enough, a line or two of C code should do the trick. See package 'inline'.


HTH,

Chuck

Basically, I have:
X = 0.1, 0.2, 0.32, 0.32, 0.4, 0.56, 0.56, 0.7...
Y = 223, 434, 343, 544, 231.... etc
of the same, admittedly large length.

Now note that some of the values of X are repeated. What I want to do is, for those X that are repeated, take the corresponding values of Y and change them to the average for that particular X.

So, ave(Y,X) will work. But it's very slow, and certainly not suited to my problem, where Y changes and X stays the same and I need to repeatedly recalculate the averaging of Y. Ave also does not take take advantage of the sorting of the data.

So, is there an alternative? (Presumeably avoiding loops.)

Thanks,

Zhou Fang

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to