Could it be some alias checking going on? Anyway, this code is horribly slow on 0.6 (even with #19097) it seems.
to_indexes(::Int64, ::Int64, ::Vararg{Int64,N}) at operators.jl:868 (repeats 3 times) kills performance. On Saturday, October 29, 2016 at 5:56:12 AM UTC+2, Jared Crean wrote: > > I'm working on an high dimensional finite difference code, and I got a > strange performance result. I have a kernel function that > computes the stencil at a given point, and an outer function, outer_func, > that loops over the dimensions and calls the kernel function at every grid > point. > I created a second function, outer_func2, with the same loops as > outer_func, but rather than call the kernel function it has the contents of > the kernel function copied into it. The source code is here: > https://github.com/JaredCrean2/wave6d/blob/master/src/test_inline.jl > > The performance results (with bounds checking disabled and > --math-mode=fast) are: > > testing outer_func > 0.398586 seconds > 0.398821 seconds > testing outer_func2 > 2.522230 seconds > 2.522479 seconds > > > > I ran this on in Intel Ivy Bridge (i7-3820) processor, using Julia 0.4.4 > > I looked at the llvm code (attached), and noticed outer_func2 has a bunch > of extra statements that look like > > %lsr.iv570 = phi i8* [ %scevgep571, %L21 ], [ %scevgep569, %L.preheader > ] > > > > that are not present for outer_func. I don't know llvm code very well > (hardly at all), so I'm not sure what these mean. Any help > understanding either the llvm code or the performance difference would be > appreciated. > > > > Thanks, > Jared Crean >