With @simd and @inbounds you can halve the time (At least on my machine with Julia 0.4). Here is a very nice article that explains what blas actually does and what Julia doesn't do: http://nbviewer.ipython.org/url/math.mit.edu/~stevenj/18.335/Matrix-multiplication-experiments.ipynb
Best, Simon Am Sonntag, 22. März 2015 14:44:19 UTC+1 schrieb Uliano Guerrini: > > First look at Julia, I read somewhere that it is advised to de-vectorize > code so I just tried this: > > function matmul(a,b) > c=zeros(typeof(a[1,1]),(size(a,1),size(b,2))) > for j = 1:size(b,2) > for i =1:size(a,1) > for k = 1:size(b,1) > c[i,j]+=a[i,k]*b[k,j] > end > end > end > c > end > > > function matmul2(a,b) > a*b > end > > > a=rand(2,3); > b=rand(3,4); > c=matmul(a,b); #just to make the JIT > c1=matmul2(a,b); #compile the functions ahed of @time > a=rand(6000,500); > b=rand(500,8000); > @time(matmul(a,b);) > @time(matmul2(a,b);) > > > > and I got that: > > elapsed time: 150.661463517 seconds (384000192 bytes allocated) > elapsed time: 0.990317124 seconds (384000192 bytes allocated) > > > the code for matrix multiplication I assume is some kind of BLAS maybe in > fortran (or assembler?) maybe optimized for SSE2, for sure using all my 4 > cores so this is not the typical example where de-vectorizing is advisable... > > > nonetheless, isn't it a factor of 150 a bit higher than expected? I missed > something important in the matmul code? > >