Blas is using a combination of SIMD and multi-core processing. Multi-core 
(threading) is coming in Julia v0.5 as an experimental feature. 

On Saturday, 16 April 2016 14:13:00 UTC+9, Jason Eckstein wrote:
>
> I noticed in Julia 4 now if you call A+B where A and B are matrices of 
> equal size, the llvm code shows vectorization indicating it is equivalent 
> to if I wrote my own function with an @simd tagged for loop.  I still 
> notice though that it uses a single core to maximum capacity but never 
> spreads an SIMD loop out over multiple cores.  In contrast if I use BLAS 
> functions like gemm! or even just A*B it will use every core of the 
> processor.  I'm not sure if these linear algebra operations also use simd 
> vectorization but I imagine they do since BLAS is very optimized.  Is there 
> a way to write an SIMD loop that spreads the data out across all processor 
> cores, not just the multiple functional units of a single core?
>

Reply via email to