Useful and interesting. Thanks for your prompt reply. -- Mike
On Sun, Nov 16, 2014 at 2:29 AM, Prof Brian Ripley <rip...@stats.ox.ac.uk> wrote: > On 16/11/2014 00:11, Michael Hannon wrote: >> >> Greetings. I'd like to get some advice about using OpenBLAS with R, >> rather >> than using the BLAS that comes built in to R. > > > That was really a topic for the R-devel list: see the posting guide. > >> I've tried this on my Fedora 20 system (see the appended for details). I >> ran >> a simple test -- multiplying two large matrices -- and the results were >> very >> impressive, i.e., in favor of OpenBLAS, which is consistent with >> discussions >> I've seen on the web. > > > If that is all you do, then you should be using an optimized BLAS, and > choose the one(s) best for your (unstated) machine(s). > >> My concern is that maybe this is too good to be true. I.e., the standard >> R >> configuration is vetted by thousands of people every day. Can I have the >> same >> degree of confidence with OpenBLAS that I have in the built-in version? > > > No. And it is 'too good to be true' for most users of R, for whom BLAS > operations take a negligible proportion of their CPU time. > >> And/or are there other caveats to using OpenBLAS of which I should be >> aware? > > > Yes: see the 'R Installation and Administration Manual'. Known issues > include: > > 1) Optimized BLAS trade accuracy for speed. Surprisingly much published R > code relies on using extended-precision FPU registers for intermediate > results, which optimized BLAS do much less than the reference BLAS. > > Some packages rely on a particular sign of the solution to svd or eigen > problems: people then report as bugs that optimized BLAS give a different > sign from the reference BLAS. > > 2) Fast BLAS normally use multi-threading: that usually helps elapsed time > for a single R task at the expense of increased total CPU time. Fine if you > have unused CPU cores, but not advantageous in a fully-used multi-core > machine, e.g. one that is doing many R sessions in parallel. > > 3) Many BLAS optimize their use of CPU caches. This works best if the > BLAS-using process is the only task running on a particular core (or CPU > where CPU cores share cache). (It also means that optimizing on one CPU > model and running on another can be disastrous.) > > >> >> Thanks. >> >> -- Mike >> >> #### Here's the version of R, compiled locally with configuration options: >> #### ./configure --enable-R-shlib --enable-BLAS-shlib >> >> $ R >> >> R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" >> Copyright (C) 2014 The R Foundation for Statistical Computing >> Platform: x86_64-unknown-linux-gnu (64-bit) >> . >> . >> . >> >> #### Here's the R source code for this little test: >> >> library(microbenchmark) >> >> mSize <- 10000 >> set.seed(42) >> >> aMat <- matrix(rnorm(mSize * mSize), nrow=mSize) >> bMat <- matrix(rnorm(mSize * mSize), nrow=mSize) >> >> cMat <- aMat %*% bMat ## do the calculation once to see that it works >> >> traceCMat <- sum(diag(cMat)) ## a mild sanity check on the calculation >> traceCMat >> >> microbenchmark(aMat %*% bMat, times=5L) ## repeat a few more times >> >> ----- >> >> #### Here is the output from code, running under various conditions: >> >>> traceCMat ###### Using the built-in BLAS from R >> >> [1] -11367.55 >>> >>> microbenchmark(aMat %*% bMat, times=5L) >> >> Unit: seconds >> expr min lq mean median uq max neval >> aMat %*% bMat 675.0064 675.5325 675.4897 675.5857 675.6618 675.662 5 >> >> ---------- >> >>> traceCMat ###### Using libopenblas.so from Fedora >> >> [1] -11367.55 >>> >>> microbenchmark(aMat %*% bMat, times=5L) >> >> Unit: seconds >> expr min lq mean median uq max >> neval >> aMat %*% bMat 70.67843 70.70545 70.76365 70.73026 70.83935 70.86475 >> 5 >>> >>> >> >> ---------- >> >>> traceCMat <- sum(diag(cMat)) ###### libopenblas.so from Fedora with >>> traceCMat ###### export OMP_NUM_THREADS=6 >> >> [1] -11367.55 >>> >>> microbenchmark(aMat %*% bMat, times=5L) >> >> Unit: seconds >> expr min lq mean median uq max neval >> aMat %*% bMat 69.99146 70.02426 70.3466 70.08327 70.39537 71.23866 5 >>> >>> >> >> ###### Fedora libopenblas.so appears to be single-threaded >> >> ---------- >> >>> traceCMat <- sum(diag(cMat)) ###### libopenblas.so compiled locally >>> traceCMat ###### from source w/OMP_NUM_THREADS=6 >> >> [1] -11367.55 >>> >>> microbenchmark(aMat %*% bMat, times=5L) >> >> Unit: seconds >> expr min lq mean median uq max >> neval >> aMat %*% bMat 26.77385 27.10434 27.17862 27.12485 27.16301 27.72705 >> 5 >>> >>> >> >> ###### Locally-compiled openblas appears to be multi-threaded >> ###### The microbenchmark appeared to use all 8 processors, even >> ###### though I asked for only 6. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > -- > Brian D. Ripley, rip...@stats.ox.ac.uk > Emeritus Professor of Applied Statistics, University of Oxford > 1 South Parks Road, Oxford OX1 3TG, UK ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.