On Tue, May 26, 2020 at 6:58 AM Jiufu Guo <guoji...@linux.ibm.com> wrote:
>
> David Edelsohn <dje....@gmail.com> writes:
>
> > On Mon, May 25, 2020 at 1:58 PM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> >>
> >> On May 25, 2020 7:40:00 PM GMT+02:00, Segher Boessenkool 
> >> <seg...@kernel.crashing.org> wrote:
> >> >On Mon, May 25, 2020 at 02:14:02PM +0200, Richard Biener wrote:
> >> >> On Mon, May 25, 2020 at 1:10 PM guojiufu <guoji...@linux.ibm.com>
> >> >wrote:
> >> >> Since a new flag is not needed to fix the regression please avoid
> >> >> adding -fcomplete-unroll-loops.
> >> >>
> >> >> For -frtl-unroll-loops you should be able to use
> >> >
> >> >Erm.  That *is* a new command-line option (the internal flags I do not
> >> >care about so much: new implementation details are *good*).  And a new
> >> >name that is a mistake in my opinion, for many reasons (users do not
> >> >know and should not have to care about "rtl"; the name is not
> >> >descriptive; it is useless churn, it is not the same name as we have
> >> >had for decades now; it is adding a new option for a future where we
> >> >will do most unrolling at gimple level, a future we do not know will
> >> >ever exist, and we do not know what that will look like anyway; it is
> >> >an extra level of indirection (in the name)).
> >> >
> >> >We should not have an -frtl-unroll-loops if we do not have a
> >> >-ftree-unroll-loops (or whatever).
> >> >
> >> >Unrolling early is not a good idea in general (the problems with the
> >> >very trivial complete unrolling case just underline that).  But we
> >> >*should* know which code we expect to unroll later, for better costing.
> >> >Adding names like "rtl-unroll-loops" only stands in the way of getting
> >> >a better design here.
> >>
> >> You folks made ppc specific hacks instead of a better design. Those
> >> now stand in the way as well. But sure, simply do not expose the
> >> flag to the users, use
> >> Var(flag_rtl_unroll_loops). My other points still stand.
> >>
> >> Feel free to ignore the regression part on the branch and come up
> >> with a great design. But don't expect to backport that then.
> >
> > I completely agree.
>
> Thanks a lot for all your comments, suggestions, and tips in the
> discussion.  Thank Richar, Segher, David, Hanza, and all!
>
> I may have an explanation about the intention of this work.
>
> We know that loop unrolling is a complex and tickly thing.  It could
> help some kinds of code in a great manner.  Sometimes there are side
> effects.  For different types of loop and different platforms, it may
> result in different effects.
> It would makes sense to tune the loop unrolling accordingly.  And so, to
> help and tune loop unrolling is what we want to do.
>
> Currently, we have loop unroller at GIMPLE part (cunroll/cunrolli) and
> RTL part.  There are some options (like -funroll-loops) and --params to
> control unrollers.
>
> Through target hook, it would be helpful for different platforms to tune
> unroller: checking the type of loops, check optimization level.
> Existing hooks may help with something, like turn --params.
>
> Adding separate flags(or options) may be helpful to control different
> behaviors independently.  This is one reason for the patch which
> introduces internal undocumented options.
>
> One previous patch, r10-4525, is tunning for ppc at -O2. Which
> implements an existing hook for rs6000 to check simple loops for RTL
> unroller. For cunroll, it just enables it even increasing size at -O2
> directly, without check the type of the loops.  And then the
> side/negative effects of cunroll are also visible at -O2 besides
> positive effects.  In PR95018, the side effect is shown on complex loop
> (early exit, and more peeling).
> One idea is for cunroll to tune it to avoid side effects. And if the
> heuristic is suitable, it would be helpful for other usage, like -O3 and
> -funroll-loops.
>
> Thanks for any comments!

For GIMPLE level transforms I don't think targets have more knowledge
than the middle-end.  In fact GIMPLE complete unrolling is about
secondary effects, removing redundancies and abstraction.  So IMHO
the correct approach is to look at individual cases and try to improve
the generic code rather than try to get better benchmark results
on a per-target manner by magical parameter tuning.

For what the RTL unroller does it indeed depends very heavily on
the target whether sth is beneficial or not.

So I'd like to see specific cases where you think cunroll should
do "better" on powerpc only but not elsewhere.

Richard.

> Jiufu
>
> >
> > This path is digging a deeper and deeper hole.
> >
> > - David

Reply via email to