Blas is using a combination of SIMD and multi-core processing. Multi-core (threading) is coming in Julia v0.5 as an experimental feature.
On Saturday, 16 April 2016 14:13:00 UTC+9, Jason Eckstein wrote: > > I noticed in Julia 4 now if you call A+B where A and B are matrices of > equal size, the llvm code shows vectorization indicating it is equivalent > to if I wrote my own function with an @simd tagged for loop. I still > notice though that it uses a single core to maximum capacity but never > spreads an SIMD loop out over multiple cores. In contrast if I use BLAS > functions like gemm! or even just A*B it will use every core of the > processor. I'm not sure if these linear algebra operations also use simd > vectorization but I imagine they do since BLAS is very optimized. Is there > a way to write an SIMD loop that spreads the data out across all processor > cores, not just the multiple functional units of a single core? >