Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Richard Biener via Gcc-patches Mon, 25 May 2020 23:59:00 -0700

On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
<seg...@kernel.crashing.org> wrote:
>
> On Mon, May 25, 2020 at 02:39:54PM +0200, Richard Biener wrote:
> > On Fri, May 22, 2020 at 6:54 PM Segher Boessenkool
> > <seg...@kernel.crashing.org> wrote:
> > > > The split above allows the "bug" to be fixed (even on the branch)
> > > > without introducing even more target specialities.
> > >
> > > So does any split.  Or I don't see what you mean?
> >
> > Well, a split that does not affect behavior for non-ppc architectures
> > when the flags by users are unchanged.  Because that allows
> > the ppc regression to be fixed on the branch.
> >
> > Then, on trunk, we can think of a better overall flag design.  Note
>
> Oh, as just a (very) temporary thing, it is fine of course (it should
> say it is then though).
>
> > that cunroll/cunrolli are not controlled by a flag currently, they
> > are gated on optimize >= [2|3] - it's just that either -funroll-loops
> > or -fpeel-loops causes its heuristics to allow code-size growth
> > by its own metrics according to the unroll --params.
> >
> > So it's a bit difficult to retrofit the heuristic behavior onto new
> > flags unless we want to completely move that over to a --param
> > that may be gets adjusted by -funroll-loops.
>
> Yes, cunroll does not have its own option, and that is a problem.  But
> that is easy to fix!  Either with an option, or just with params (the
> option wouldn't do more than set params anyway?)


Well, given coming up with different names for essentially the same
transform is going to be challenging how about sth like

-funroll-loops={early,late,static,dynamic}[insert better names here]

note there's also -fpeel-loops which may match the transform
done on GIMPLE better?  I'm not sure which are the technically
correct terms for unrollings that elide the loop (the backedge).
We're doing such kind of unrolling even if we cannot statically
decide which of a set of possible exits we take (and internally
call that peeling, if we can statically decide we call it complete
unrolling).  The RTL side OTOH only performs classical unrolling,
preserving the backedge with various strategies for the
remaining iterations.

As said, for the regression on the 10 branch with ppc I'd add
[a hidden] flag that controls the RTL unroller, also set by
-funroll-loops and triggered by the ppc specific heuristics.

Richard.

>
> Segher

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Reply via email to