> Another application of that technique can be used to quickly compute > medians by groups: > > gm <- function(x, group){ # medians by group: > sapply(split(x,group),median) > o<-order(group, x) > group <- group[o] > x <- x[o] > changes <- group[-1] != group[-length(group)] > first <- which(c(TRUE, changes)) > last <- which(c(changes, TRUE)) > lowerMedian <- x[floor((first+last)/2)] > upperMedian <- x[ceiling((first+last)/2)] > median <- (lowerMedian+upperMedian)/2 > names(median) <- group[first] > median > } > > For a 10^5 long x and a somewhat fewer than 3*10^4 distinct groups > (in random order) the times are: > >> group<-sample(1:30000, size=100000, replace=TRUE) >> x<-rnorm(length(group))*10 + group >> unix.time(z0<-sapply(split(x,group), median)) > user system elapsed > 2.72 0.00 3.20 >> unix.time(z1<-gm(x,group)) > user system elapsed > 0.12 0.00 0.16 >> identical(z1,z0) > [1] TRUE
I get: > unix.time(z0<-sapply(split(x,group), median)) user system elapsed 2.733 0.017 2.766 > unix.time(z1<-gm(x,group)) user system elapsed 2.897 0.032 2.946 Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.