Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Jiufu Guo via Gcc-patches Thu, 28 May 2020 00:12:23 -0700

Richard Biener <richard.guent...@gmail.com> writes:

> On Wed, May 27, 2020 at 6:36 AM Jiufu Guo <guoji...@linux.ibm.com> wrote:
>>
>> Segher Boessenkool <seg...@kernel.crashing.org> writes:
>>
>> > Hi!
>> >
>> > On Tue, May 26, 2020 at 08:58:13AM +0200, Richard Biener wrote:
>> >> On Mon, May 25, 2020 at 7:44 PM Segher Boessenkool
>> >> <seg...@kernel.crashing.org> wrote:
>> >> > Yes, cunroll does not have its own option, and that is a problem.  But
>> >> > that is easy to fix!  Either with an option, or just with params (the
>> >> > option wouldn't do more than set params anyway?)
>> >>
>> >> Well, given coming up with different names for essentially the same
>> >> transform is going to be challenging how about sth like
>> >>
>> >> -funroll-loops={early,late,static,dynamic}[insert better names here]
>> >
>> > User interface is hard :-)  I think luckily we don't need to change
>> > anything there yet, just have an internal flag?
>> >
>> > But complete unrolling is something quite different, so it should have
>> > its own flag anyway (at least internally).
>> >
>> >> note there's also -fpeel-loops which may match the transform
>> >> done on GIMPLE better?
>> >
>> > Peeling and unrolling are the same thing, if doing complete unrolling
>> > (or complete peeling), followed by DCE in both cases.  Peeling is a
>> > nicer name here I think, yeah.
>> >
>> >> I'm not sure which are the technically
>> >> correct terms for unrollings that elide the loop (the backedge).
>> >
>> > I don't know a better term than "complete", I don't remember ever seeing
>> > something else either.
>>
>> How about "Var(flag_cunroll_grow_size) EnabledBy(funroll-loops ||
>> funroll-all-loops || fpeel-loops)" Or flag_cunroll_allow_grow_size?
>>
>> And then using this flags as:
>>   unsigned int val = tree_unroll_loops_completely (flag_cunroll_grow_size
>>                                                    || optimize >= 3, true);
>>
>> And we do not need to enable this flag at -O2.
>
> Sure this works for me.  Note I'd make funroll-loops enabled by
> funroll-all-loops so you could simplify the above.


After some checking, I did not use 'EnabledBy(funroll-all-loops)' for
funroll-loops, because of special behavior on -fno-unroll-all-loops.
Command line -fno-unroll-all-loops turns off flag_unroll_loops, if
flag_unroll_loops is enabled implicitly(not through command line). 

While the orignal logic is: only possitive flag_unroll_all_loops
overrides flag_unroll_loops. 
"if (flag_unroll_all_loops) flag_unroll_loops = 1;" in process_options.

Thanks.
Jiufu

>
> Richard.
>
>> Thanks for all your helpful comments again!
>>
>> Jiufu
>>
>> >
>> >> We're doing such kind of unrolling even if we cannot statically
>> >> decide which of a set of possible exits we take (and internally
>> >> call that peeling, if we can statically decide we call it complete
>> >> unrolling).
>> >
>> > "Peeling" is placing some copies of the loop before the loop;
>> > "unrolling" is placing a few copies of the loop inside the loop body.
>> > Does that match usage here?
>> >
>> >> The RTL side OTOH only performs classical unrolling,
>> >> preserving the backedge with various strategies for the
>> >> remaining iterations.
>> >
>> > And if you do complete unrolling that way, the backedge can be removed,
>> > since it can be shown never to be taken.
>> >
>> >> As said, for the regression on the 10 branch with ppc I'd add
>> >> [a hidden] flag that controls the RTL unroller, also set by
>> >> -funroll-loops and triggered by the ppc specific heuristics.
>> >
>> > But the problem is in cunroll?  This is so backwards...  Because some
>> > other transform abuses the unroller flags, adding a second level flag
>> > with the same meaning :-(  It will work for fixing the regression,
>> > sure, and it is slightly less code as well.
>> >
>> >
>> > Segher

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Reply via email to