Le 09/05/2014 15:36, Adam Ralph a écrit :

Can it be parallelized? That is how you reduce run-time. One of the
tests matrix-matrix multiplication has been successfully speeded up by
using GPUs.  CUDA is the language used for this, which is a derivative
of C. To be fair you only see the benefit for really large matrices,
smaller ones which actually be slower on GPUs.

Talking about matrix multiplication, the benchmark is biased as they used openblas. The core multiplication is done in assembly. So it's probably machine code rules!

mathematica and octave probably use mkl, same as openblas. This is the reason why fortran, C, julia, mathematica and matlab perform the same. They are all linked to mkl or openblas.

There is a fantastic link on the optimization on general matrix matrix mutiplication:
<http://wiki.cs.utexas.edu/rvdg/HowToOptimizeGemm>

Python uses numpy which is a wrapper around lapack aka fortran... Same for R. I think octave too.

One general advice to give would be not try to optimise, use a library instead.

openblas/mkl/acml for linear algebra
fftw for fourier transform
sleef for trigonometric functions and other math functions
mixmax for random numbers
And so on...

Pascal

Reply via email to