Jan Hubicka <hubi...@ucw.cz> writes: >> Segher Boessenkool <seg...@kernel.crashing.org> writes: >> >> > On Wed, May 20, 2020 at 12:30:30PM +0200, Richard Biener wrote: >> >> I think this is the wrong way to approach this. You're doing too many >> >> things at once. Try to fix the powerpc regression with the extra >> >> flag_rtl_unroll_loops, that could be backported. Then you can >> >> Or flag_complete_unroll_loops(-fcomplete-unroll-loops) for GIMPLE >> cunroll? >> >> independently see whether enabling more unrolling at -O2 makes >> >> sense. Because currently we _do_ unroll at -O2 when it does >> >> not increase size. Its just your patches make this as aggressive >> >> as -O3. >> >> I'm also thinking about enabling more cunroll at -O2 even with some size >> increasing. Full cunroll enablement make it like -O3. As some >> discussion in PRs (e.g. PR88760), small/simple loops unrolling may be in >> favor of some platforms (but not for all platforms, like x86_64?). This >> would make us to have target specified hook. Or do some generic >> setting: accept to unroll/peel limit times if the loop body is simple >> and small, together with target specific hook. > > We now have --params that can be tuned differently for -O2 and -O3 so > looking into cunroll was one of my todo for GCC 10 -O2 retuning but i did > not get any very conclusive benchmark results outside SPEC. > I planned to return to it next stage1, so it may be good time. > Do you have any benchmarks on ppc?
541.leela_r, 548.exchange2_r and 557.xz_r from SPEC2017 are visbily affected by cunroll. They can be used to tune cunroll, I think. > Of couse there is no need to keep same defaults for all targets, but in > general having target specific defaults increases number of knobs we > need to check and keep up to date. Thanks, Jiufu > > Honza >> >> Any comments? Thanks! >> Jiufu >> >> > >> > Just do a separate flag (and option) for cunroll, instead? >> > >> > The RTL unroller is *the* unroller, and has been since forever. >> > >> > >> > Segher