Re: [RFC] Tree loop unroller pass

Richard Biener Fri, 16 Feb 2018 07:01:39 -0800

On February 16, 2018 3:22:22 PM GMT+01:00, Wilco Dijkstra 
<wilco.dijks...@arm.com> wrote:
>Richard Biener wrote:
>>> This is a great plan - GCC urgently requires a good unroller!
>>
>> How so?
>
>I thought it is well-known for many years that the rtl unroller doesn't
>work properly.
>In practically all cases where LLVM beats GCC, it is due to unrolling
>small loops.
>
>You may have noticed how people have been enabling
>-fprefetch-loop-arrays by
>default in some AArch64 configurations and then strip out most/all
>prefetches in
>order to get the effect of tree unrolling... However the unroll
>parameters of this
>pass are even worse than -funroll-loops, so it ends up using crazy
>unroll factors.
>
>> To generate more ILP for modern out-of-order processors you need to
>be
>> able to do followup transforms that remove dependences.  So rather
>than
>> inventing magic params we should look at those transforms and key
>> unrolling on them.  Like we do in predictive commoning or other
>passes
>> that end up performing unrolling as part of their transform.
>
>This is why unrolling needs to be done at the tree level. Alias info is
>correct,
>addressing modes end up more optimal and the scheduler can now
>interleave 
>the iterations (often not possible after the rtl-unroller due to bad
>alias info).
> 
>> Our measurements on x86 concluded that unrolling isn't worth it, in
>fact
>> it very often hurts.  That was of course with saner params than the
>defaults
>> of the RTL unroller.
>>
>> Often you even have to fight with followup passes doing stuff that
>ends up
>> inreasing register pressure too much so we end up spilling.
>
>Yes that's why I mentioned we should only unroll small loops where
>there
>is always a benefit from reduced loop counter increments and branching.
>
>> So _please_ first get testcases we know unrolling will be beneficial
>on
>> and _also_ have a thorough description _why_.
>
>I'm sure we can find good examples. The why will be obvious just from
>instruction
>count.


With Ooo CPUs speculatively executing the next iterations I very much doubt 
that. 

Richard. 

>Wilco

Re: [RFC] Tree loop unroller pass

Reply via email to