Doesn't Fortran still require that the arguments to a function not alias each other (in whole or in part)? I could imagine that var() might call into Fortran code (BLAS or LAPACK). Wouldn you want to chance erroneous results at a high optimization level to save a bit of time in an unusual situation?
(I could also imagine someone changing the R interpreter so that x and x[-length(x)] could share the same memory block and that could cause Fortran aliasing problems as well.) Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > -----Original Message----- > From: r-devel-boun...@r-project.org > [mailto:r-devel-boun...@r-project.org] On Behalf Of Wacek Kusnierczyk > Sent: Monday, March 23, 2009 4:40 PM > To: r-devel@r-project.org > Cc: r-h...@r-project.org; rkevinbur...@charter.net; Bert Gunter > Subject: Re: [Rd] [R] variance/mean > > > (this post suggests a patch to the sources, so i allow myself > to divert > it to r-devel) > > Bert Gunter wrote: > > x a numeric vector, matrix or data frame. > > y NULL (default) or a vector, matrix or data frame with compatible > > dimensions to x. The default is equivalent to y = x (but > more efficient). > > > > > bert points to an interesting fragment of ?var: it suggests that > computing var(x) is more efficient than computing var(x,x), for any x > valid as input to var. indeed: > > set.seed(0) > x = matrix(rnorm(10000), 100, 100) > > library(rbenchmark) > benchmark(replications=1000, columns=c('test', 'elapsed'), > var(x), > var(x, x)) > # test elapsed > # 1 var(x) 1.091 > # 2 var(x, x) 2.051 > > that's of course, so to speak, unreasonable: for what var(x) does is > actually computing the covariance of x and x, which should be the same > as var(x,x). > > the hack is that if y is given, there's an overhead of memory > allocation > for *both* x and y when y is given, as seen in src/main/cov.c:720+. > incidentally, it seems that the problem can be solved with a > trivial fix > (see the attached patch), so that > > set.seed(0) > x = matrix(rnorm(10000), 100, 100) > > library(rbenchmark) > benchmark(replications=1000, columns=c('test', 'elapsed'), > var(x), > var(x, x)) > # test elapsed > # 1 var(x) 1.121 > # 2 var(x, x) 1.107 > > with the quick checks > > all.equal(var(x), var(x, x)) > # TRUE > > all(var(x) == var(x, x)) > # TRUE > > and for cor it seems to make cor(x,x) slightly faster than > cor(x), while > originally it was twice slower: > > # original > benchmark(replications=1000, columns=c('test', 'elapsed'), > cor(x), > cor(x, x)) > # test elapsed > # 1 cor(x) 1.196 > # 2 cor(x, x) 2.253 > > # patched > benchmark(replications=1000, columns=c('test', 'elapsed'), > cor(x), > cor(x, x)) > # test elapsed > # 1 cor(x) 1.207 > # 2 cor(x, x) 1.204 > > (there is a visible penalty due to an additional pointer > test, but it's > 10ms on 1000 replications with 10000 data points, which i think is > negligible.) > > > This is as clear as I would know how to state. > > i believe bert is right. > > however, with the above fix, this can now be rewritten as: > > " > x: a numeric vector, matrix or data frame. > y: a vector, matrix or data frame with dimensions compatible > to those of x. > By default, y = x. > " > > which, to my simple mind, is even more clear than what bert would know > how to state, and less likely to cause the sort of confusion that > originated this thread. > > the attached patch suggests modifications to src/main/cov.c and > src/library/stats/man/cor.Rd. > it has been prepared and checked as follows: > > svn co https://svn.r-project.org/R/trunk trunk > cd trunk > # edited the sources > svn diff > cov.diff > svn revert -R src > patch -p0 < cov.diff > > tools/rsync-recommended > ./configure > make > make check > bin/R > # subsequent testing within R > > if you happen to consider this patch for a commit, please be sure to > examine and test it carefully first. > > vQ > ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel