(this post suggests a patch to the sources, so i allow myself to divert it to r-devel)
Bert Gunter wrote: > x a numeric vector, matrix or data frame. > y NULL (default) or a vector, matrix or data frame with compatible > dimensions to x. The default is equivalent to y = x (but more efficient). > > bert points to an interesting fragment of ?var: it suggests that computing var(x) is more efficient than computing var(x,x), for any x valid as input to var. indeed: set.seed(0) x = matrix(rnorm(10000), 100, 100) library(rbenchmark) benchmark(replications=1000, columns=c('test', 'elapsed'), var(x), var(x, x)) # test elapsed # 1 var(x) 1.091 # 2 var(x, x) 2.051 that's of course, so to speak, unreasonable: for what var(x) does is actually computing the covariance of x and x, which should be the same as var(x,x). the hack is that if y is given, there's an overhead of memory allocation for *both* x and y when y is given, as seen in src/main/cov.c:720+. incidentally, it seems that the problem can be solved with a trivial fix (see the attached patch), so that set.seed(0) x = matrix(rnorm(10000), 100, 100) library(rbenchmark) benchmark(replications=1000, columns=c('test', 'elapsed'), var(x), var(x, x)) # test elapsed # 1 var(x) 1.121 # 2 var(x, x) 1.107 with the quick checks all.equal(var(x), var(x, x)) # TRUE all(var(x) == var(x, x)) # TRUE and for cor it seems to make cor(x,x) slightly faster than cor(x), while originally it was twice slower: # original benchmark(replications=1000, columns=c('test', 'elapsed'), cor(x), cor(x, x)) # test elapsed # 1 cor(x) 1.196 # 2 cor(x, x) 2.253 # patched benchmark(replications=1000, columns=c('test', 'elapsed'), cor(x), cor(x, x)) # test elapsed # 1 cor(x) 1.207 # 2 cor(x, x) 1.204 (there is a visible penalty due to an additional pointer test, but it's 10ms on 1000 replications with 10000 data points, which i think is negligible.) > This is as clear as I would know how to state. i believe bert is right. however, with the above fix, this can now be rewritten as: " x: a numeric vector, matrix or data frame. y: a vector, matrix or data frame with dimensions compatible to those of x. By default, y = x. " which, to my simple mind, is even more clear than what bert would know how to state, and less likely to cause the sort of confusion that originated this thread. the attached patch suggests modifications to src/main/cov.c and src/library/stats/man/cor.Rd. it has been prepared and checked as follows: svn co https://svn.r-project.org/R/trunk trunk cd trunk # edited the sources svn diff > cov.diff svn revert -R src patch -p0 < cov.diff tools/rsync-recommended ./configure make make check bin/R # subsequent testing within R if you happen to consider this patch for a commit, please be sure to examine and test it carefully first. vQ
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.