With @simd and @inbounds you can halve the time (At least on my machine 
with Julia 0.4).
Here is a very nice article that explains what blas actually does and what 
Julia doesn't do:
http://nbviewer.ipython.org/url/math.mit.edu/~stevenj/18.335/Matrix-multiplication-experiments.ipynb

Best,
Simon
Am Sonntag, 22. März 2015 14:44:19 UTC+1 schrieb Uliano Guerrini:
>
> First look at Julia, I read somewhere that it is advised to de-vectorize 
> code so I just tried this:
>
> function matmul(a,b)
>     c=zeros(typeof(a[1,1]),(size(a,1),size(b,2)))
>     for j = 1:size(b,2)
>         for i =1:size(a,1)
>             for k = 1:size(b,1)
>                 c[i,j]+=a[i,k]*b[k,j]
>             end
>         end
>     end
>     c
> end
>
>
> function matmul2(a,b)
>     a*b
> end
>
>
> a=rand(2,3);
> b=rand(3,4);
> c=matmul(a,b);   #just to make the JIT 
> c1=matmul2(a,b); #compile the functions ahed of @time
> a=rand(6000,500);
> b=rand(500,8000);
> @time(matmul(a,b);)
> @time(matmul2(a,b);)
>
>
>
> and I got that:
>
> elapsed time: 150.661463517 seconds (384000192 bytes allocated)
> elapsed time: 0.990317124 seconds (384000192 bytes allocated)
>
>
> the code for matrix multiplication I assume is some kind of BLAS maybe in 
> fortran (or assembler?) maybe optimized for SSE2, for sure using all my 4 
> cores so this is not the typical example where de-vectorizing is advisable...
>
>
> nonetheless, isn't it a factor of 150 a bit higher than expected? I missed 
> something important in the matmul code?
>
>

Reply via email to