Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Jan Hubicka Thu, 21 May 2020 01:04:09 -0700

> Segher Boessenkool <seg...@kernel.crashing.org> writes:
> 
> > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote:
> >> I think this is the wrong way to approach this.  You're doing too many
> >> things at once.  Try to fix the powerpc regression with the extra
> >> flag_rtl_unroll_loops, that could be backported.  Then you can
> 
> Or flag_complete_unroll_loops(-fcomplete-unroll-loops) for GIMPLE
> cunroll?
> >> independently see whether enabling more unrolling at -O2 makes
> >> sense.  Because currently we _do_ unroll at -O2 when it does
> >> not increase size.  Its just your patches make this as aggressive
> >> as -O3.
> 
> I'm also thinking about enabling more cunroll at -O2 even with some size
> increasing.  Full cunroll enablement make it like -O3. As some
> discussion in PRs (e.g. PR88760), small/simple loops unrolling may be in
> favor of some platforms (but not for all platforms, like x86_64?).  This
> would make us to have target specified hook.  Or do some generic
> setting: accept to unroll/peel limit times if the loop body is simple
> and small, together with target specific hook.


We now have --params that can be tuned differently for -O2 and -O3 so
looking into cunroll was one of my todo for GCC 10 -O2 retuning but i did
not get any very conclusive benchmark results outside SPEC. 
I planned to return to it next stage1, so it may be good time.
Do you have any benchmarks on ppc?
Of couse there is no need to keep same defaults for all targets, but in
general having target specific defaults increases number of knobs we
need to check and keep up to date.

Honza
> 
> Any comments? Thanks!
> Jiufu
> 
> >
> > Just do a separate flag (and option) for cunroll, instead?
> >
> > The RTL unroller is *the* unroller, and has been since forever.
> >
> >
> > Segher

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Reply via email to