The R-devel version of R provides a pluggable BLAS, which makes such tests fairly easy (although building the BLAS themselves is not). On dual Opterons, using multiple threads is often not worthwhile and can be counter-productive (Doug Bates has found some dramatic examples, and you can see them in my timings below).
So timings for FC3, gcc 3.4.6, dual Opteron 252, 64-bit build of R. ACML 3.5.0 is by far the easiest to install (on R-devel all you need to do is to link libacml.so to lib/libRblas.so) and pretty competitive, so that is what I normally use. These timings are not very repeatable: to a few % only even after averaging quite a few runs. set.seed(123) X <- matrix(rnorm(1e6), 1000) system.time(for(i in 1:25) X%*%X) system.time(for(i in 1:25) solve(X)) system.time(for(i in 1:10) svd(X)) internal BLAS (-O3) > system.time(for(i in 1:25) X%*%X) [1] 96.939 0.341 97.375 0.000 0.000 > system.time(for(i in 1:25) solve(X)) [1] 110.316 1.652 112.006 0.000 0.000 > system.time(for(i in 1:10) svd(X)) [1] 165.550 1.131 166.806 0.000 0.000 Goto 1.03, 1 thread > system.time(for(i in 1:25) X%*%X) [1] 12.949 0.191 13.143 0.000 0.000 > system.time(for(i in 1:25) solve(X)) [1] 23.201 1.449 24.652 0.000 0.000 > system.time(for(i in 1:10) svd(X)) [1] 43.318 1.016 44.361 0.000 0.000 Goto 1.03, dual CPU > system.time(for(i in 1:25) X%*%X) [1] 15.038 0.244 8.488 0.000 0.000 > system.time(for(i in 1:25) solve(X)) [1] 26.569 2.239 19.814 0.000 0.000 > system.time(for(i in 1:10) svd(X)) [1] 59.912 1.799 50.350 0.000 0.000 ACML 3.5.0 (single-threaded) > system.time(for(i in 1:25) X%*%X) [1] 13.794 0.368 14.164 0.000 0.000 > system.time(for(i in 1:25) solve(X)) [1] 22.990 1.695 24.710 0.000 0.000 > system.time(for(i in 1:10) svd(X)) [1] 48.267 1.373 49.662 0.000 0.000 ATLAS 3.6.0, single-threaded > system.time(for(i in 1:25) X%*%X) [1] 16.164 0.404 16.572 0.000 0.000 > system.time(for(i in 1:25) solve(X)) [1] 26.200 1.704 27.907 0.000 0.000 > system.time(for(i in 1:10) svd(X)) [1] 50.150 1.462 51.619 0.000 0.000 ATLAS 3.6.0, multi-threaded > system.time(for(i in 1:25) X%*%X) [1] 17.657 0.468 9.775 0.000 0.000 > system.time(for(i in 1:25) solve(X)) [1] 38.388 2.353 30.141 0.000 0.000 > system.time(for(i in 1:10) svd(X)) [1] 95.611 3.039 88.917 0.000 0.000 On Sun, 23 Jul 2006, Evan Cooch wrote: > Greetings - > > A quick perusal of some of the posts to this maillist suggest the level > of the questions is probably beyond someone working at my level, but at > the risk of looking foolish publicly (something I find I get > increasingly comfortable with as I get older), here goes: > > My research group recently purchased a multi-Opteron system (bunch of > 880 chips), running 64-bit RHEL 4 (which we have site licensed at our > university, so it cost us nothing - good price) with SMP support built > into the kernel (perhaps obviously, for a multi-pro system). Several of > our user use [R], which I've only used on a few occasions. However, it > is part of my task to get [R] installed for folks using this system. > > While the simple, basic compile sequence (./configure, make, make check, > make install) went smoothly, its pretty clear from our benchmarks that > the [R] code isn't running as 'rocket-fast' as it should for a system > like this. So, I dig a bit deeper. Most of the jobs we want to run could > benefit from BLAS support (lots of array manipulations and other bits of > linear algebra), and a few other compilation optimizations - and here is > where I seek advice. > > 1) Looks like there are 3-4 flavours: LAPACK, ATLAS, ACML > (AMD-chips...), and Goto. In reading what I can find, it seems that > there are reasons not to use ACML (single-thread) despite the AMD chips, > reasons to avoid ATLAS (some hassles compiling on RHEL 4 boxes), reasons > to avoid LAPACK (ibid), but apparently no problems with Goto BLAS. > > Is that a reasonable summary? At the risk of starting a larger > discussion, I'm simply looking to get BLAS support, yielding the fastest > [R] code with the minimum of hassles (while tweaking lines of configure > fies, weird linker sequences and all that used to appeal when I was a > student, I don't have time at this stage). So, any quick recommendation > for *which* BLAS library? My quick assessment suggests goto BLAS, but > I'm hoping for some confirmation. > > 3) compilation of BLAS - I can compile for 32-bit, or 64-bit. > Presumably, given we've invested in 64-bit chips, and a 64-bit OS, we'd > like to consider a 64-bit compilation. Which, also presumably, means > we'd need 64-bit compilation for [R]. While I've read the short blurb on > CRAN concerning 64-bi vs 32-bit compilations (data size vs speed), I'd > be happy to have both on our machine. But, I'm not sure how one > specifies 64-bits in the [R] compilation - what flags to I need to set > during ./configure, or what config file do I need to edit? > > Thanks very much in advance - and, again, apologies for the 'low-level' > of these questions, but one needs to start somewhere. > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel