Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

Wilco Dijkstra Fri, 15 Nov 2019 09:14:31 -0800

Hi Richard,

> So what do we actually do unpatched with -funroll-loops here?


Yes so it does the insane "fully unrolled trailing loop before the unrolled
loop" thing. One always does the trailing loop last (and typically as an
actual loop of course) and then the code ends up much faster, close to
the ideal version shown in the PR.

> When I force "stupid" unrolling the unrolled
> part is clearly worse and I see no benefit from unrolling here
> (eventually over the histogram of the number of iterations the
> repeated exits are now better predicted because they are duplicated).

For these kinds of loops, stupid unrolling is clearly better than the
default unrolling, both in size and in performance. For the example
we only ever execute part of the "trailing" loop, and never enter the
unrolled main loop!

> I wonder if structuring  it more like the "stupid" case would be
> better, thus

The key thing is to avoid emitting an unrolled trailing loop before
the actual unrolled loop. The stupid strategy can already do that,
hence it often faster than unrolling badly.

> that would cater for your case, does not require -funroll-all-loops
> or a new target hook.  It might be that all that is needed is
> reordering of basic blocks with the current scheme to decrease
> branch density.  Alternatively limit the unroll factor or pad
> the prologue jumps to make the branch predictor happy.

We shouldn't treat symptoms but fix the underlying problems.

Wilco

Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64

Reply via email to