[julia-users] Re: Performance of Kernel Inlining

Jared Crean Sat, 29 Oct 2016 08:06:07 -0700

I noticed this morning that the loop are in the wrong order for a column 
major array.  Reversing them, I get:


testing outer_func
  0.294904 seconds
  0.296689 seconds
testing outer_func2
  0.280391 seconds
  0.281223 seconds

Now both versions have the phi instructions, so I guess that wasn't the 
problem 


And sprinkling a little @simd on the inner loops:

testing outer_func
  0.159910 seconds
  0.157640 seconds
testing outer_func2
  0.151384 seconds
  0.152224 seconds

I'm going to write a Fortran code to do a performance comparison, but this 
is looking pretty good.

Do you think I should file a performance issue for the original code?

  Jared Crean



On Saturday, October 29, 2016 at 4:13:48 AM UTC-4, Kristoffer Carlsson 
wrote:
>
> Could it be some alias checking going on?
>
> Anyway, this code is horribly slow on 0.6 (even with #19097) it seems.
>
> to_indexes(::Int64, ::Int64, ::Vararg{Int64,N}) at operators.jl:868 
> (repeats 3 times)
> kills performance.
>
>
> On Saturday, October 29, 2016 at 5:56:12 AM UTC+2, Jared Crean wrote:
>>
>> I'm working on an high dimensional finite difference code, and I got a 
>> strange performance result. I have a kernel function that
>> computes the stencil at a given point, and an outer function, outer_func, 
>> that loops over the dimensions and calls the kernel function at every grid 
>> point.
>> I created a second function, outer_func2, with the same loops as 
>> outer_func, but rather than call the kernel function it has the contents of
>> the kernel function copied into it.  The source code is here: 
>> https://github.com/JaredCrean2/wave6d/blob/master/src/test_inline.jl
>>
>> The performance results (with bounds checking disabled and 
>> --math-mode=fast) are:
>>
>> testing outer_func
>>   0.398586 seconds
>>   0.398821 seconds
>> testing outer_func2
>>   2.522230 seconds
>>   2.522479 seconds
>>
>>
>>
>> I ran this on in Intel Ivy Bridge (i7-3820) processor, using Julia 0.4.4
>>
>> I looked at the llvm code (attached), and noticed outer_func2 has a bunch 
>> of extra statements that look like
>>
>>   %lsr.iv570 = phi i8* [ %scevgep571, %L21 ], [ %scevgep569, %L.preheader 
>> ]
>>
>>
>>
>> that are not present for outer_func.  I don't know llvm code very well 
>> (hardly at all), so I'm not sure what these mean.  Any help
>> understanding either the llvm code or the performance difference would be 
>> appreciated.
>>
>>
>>
>>   Thanks,
>>      Jared Crean
>>
>

[julia-users] Re: Performance of Kernel Inlining

Reply via email to