That's interesting. I generally don't test with gcc and my experiments with ICC/C have shown something like 20% slower for LLVM/native threads for some class of benchmarks (like blackscholes) but 2-4x slower for some other benchmarks (like laplace-3d). The 20% may be attributable to ICC being better (including at vectorization like you mention) but certainly not the 2-4x. These larger differences are still under investigation.
I guess something we have said in the docs or our postings have created this impression that our performance gains are somehow related to MKL or blas in general. If you have MKL then you can compile Julia to use it through its LLVM path. ParallelAccelerator does not insert calls to MKL where they didn't exist in the incoming IR and I don't think ICC does either. If MKL calls exist in the incoming IR then we don't modify them either. On Wednesday, October 26, 2016 at 7:51:33 PM UTC-7, Ralph Smith wrote: > > This is great stuff. Initial observations (under Linux/GCC) are that > native threads are about 20% faster than OpenMP, so I surmise you are > feeding LLVM some very tasty > code. (I tested long loops with straightforward memory access.) > > On the other hand, some of the earlier posts make me think that you were > leveraging the strong vector optimization of the Intel C compiler and its > tight coupling to > MKL libraries. If so, is there any prospect of getting LLVM to take > advantage of MKL? > > > On Wednesday, October 26, 2016 at 8:13:38 PM UTC-4, Todd Anderson wrote: >> >> Okay, METADATA with ParallelAccelerator verison 0.2 has been merged so if >> you do a standard Pkg.add() or update() you should get the latest version. >> >> For native threads, please note that we've identified some issues with >> reductions and stencils that have been fixed and we will shortly be >> released in version 0.2.1. I will post here again when that release takes >> place. >> >> Again, please give it a try and report back with experiences or file bugs. >> >> thanks! >> >> Todd >> >