On Sun, 15 Feb 2009, Zhou Fang wrote:
Hi,
This is probably really obvious, by I can't seem to find anything on it.
Is there a fast version of ave for when the data is already sorted in terms
of the factor, or if the breaks are already known?
If all you want are means, you can use rle() and colMeans() to good
effect:
foo2 <-
function (x,y)
{
reps <- rle(x)$lengths
lens <- rep(reps,reps)
uniqLens <- unique(lens)
for (i in uniqLens[ uniqLens != 1]){
y[ lens == i] <-
rep( colMeans(matrix(y[ lens == i], nr=i)), each=i)
}
y
}
x <- sort( round( runif(100000, 0 , 1 ), 5) )
y <- sample(1000000,100000)
all.equal(ave(y,x),foo2(x,y))
[1] TRUE
system.time(foo2(x,y))
user system elapsed
0.087 0.029 0.117
system.time(ave(y,x))
user system elapsed
1.933 0.030 1.980
If, as in your example, a substantial fraction of the X's are unique, and
if you want to generalize to more than means, then you can still gain a
lot by treating the unique and non-unique values separately like this:
foo <-
function (x,y)
{
reps <- rle(x)$lengths
len.not.1 <- rep(reps,reps) != 1
y[ len.not.1] <- ave( y[ len.not.1], x[ len.not.1 ])
y
}
y <- sample(1000000,100000)
x <- sort( round( runif(100000, 0 , 2 ), 5) )
system.time(foo(x,y))
user system elapsed
0.577 0.027 0.628
system.time(ave(y,x))
user system elapsed
2.513 0.038 2.545
table(table(x))
1 2 3 4 5 6
60526 15161 2578 318 28 1
And if neither of these is quite good enough, a line or two of C code
should do the trick. See package 'inline'.
HTH,
Chuck
Basically, I have:
X = 0.1, 0.2, 0.32, 0.32, 0.4, 0.56, 0.56, 0.7...
Y = 223, 434, 343, 544, 231.... etc
of the same, admittedly large length.
Now note that some of the values of X are repeated. What I want to do is, for
those X that are repeated, take the corresponding values of Y and change them
to the average for that particular X.
So, ave(Y,X) will work. But it's very slow, and certainly not suited to my
problem, where Y changes and X stays the same and I need to repeatedly
recalculate the averaging of Y. Ave also does not take take advantage of the
sorting of the data.
So, is there an alternative? (Presumeably avoiding loops.)
Thanks,
Zhou Fang
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.