Richard Biener <richard.guent...@gmail.com> writes: > On Wed, May 20, 2020 at 10:27 AM Jiufu Guo <guoji...@linux.ibm.com> wrote: >> >> Richard Biener <richard.guent...@gmail.com> writes: >> >> > On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches >> > <gcc-patches@gcc.gnu.org> wrote: >> >> >> >> Hi, >> >> >> >> In r10-4525, and r10-4161, loop unroller was enabled for simply loops at >> >> -O2. >> >> At the same time, the GIMPLE cunroll is also enabled, while it is not >> >> only for >> >> simple loops. This patch introduces a hook to check if a loop is >> >> suitable to >> >> unroll completely. The hook can be used to check if a loop is simply >> >> enough >> >> for complete unroll, then avoid negative effects like size increasing or >> >> complex loop unrolling at -O2. >> >> >> >> This patch could help to handle the code in PR95018 which contains complex >> >> loop, and does not cause obvious performance change on SPEC2017. >> >> >> >> Bootstrap/regtest pass on powerpc64le. OK for trunk? >> > >> > I fear you're just digging the rs6000 specific hole you dug up deeper >> > with this ... >> >> Yes, this is because r10-4525 and r10-4161 adjust RTL unroller hook for >> rs6000, and they have the effect to enable GIMPLE cunroll at O2 for all >> loops. So, I introduce this patch to adjust GIMPLE cunroll similarly. > > I think a better approach may be to leave GIMPLE unroll untouched by > changing the earlier patch to set a new flag_rtl_unroll_loops > (where -funroll-loops would enable -frtl-unroll-loops) and gate the RTL > unroller on flag_rtl_unroll_loops which you can enable separately > on rs6000. > > That is, not add even more rs6000 specialities but restrict the earlier > one to what it was supposed to do - the GIMPLE effects were clearly > not your goal.
I also lean to enable GIMPLE cunroll for O2 (at that time only for rs6000), especially for those loops which are simple and not increase size too much. Thanks Richard. > > Richard. > >> Thanks, >> Jiufu >> >> > >> > Richard. >> > >> >> Jiufu >> >> >> >> 2020-05-20 Jiufu Guo <guoji...@cn.ibm.com> >> >> >> >> PR target/95018 >> >> * target.def (loop_allow_unroll_completely_peel): New hook. >> >> * doc/tm.texi (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New >> >> hook. >> >> * doc/tm.texi.in (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New >> >> hook. >> >> * config/rs6000/rs6000.c >> >> (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook. >> >> (rs6000_loop_allow_unroll_completely_peel): New hook >> >> implementation. >> >> * tree-ssa-loop-ivcanon.c >> >> (canonicalize_loop_induction_variables): Call >> >> hook loop_allow_unroll_completely_peel. >> >> >> >> >> >> --- >> >> gcc/config/rs6000/rs6000.c | 19 +++++++++++++++++++ >> >> gcc/doc/tm.texi | 9 +++++++++ >> >> gcc/doc/tm.texi.in | 2 ++ >> >> gcc/target.def | 12 ++++++++++++ >> >> gcc/tree-ssa-loop-ivcanon.c | 5 +++++ >> >> 5 files changed, 47 insertions(+) >> >> >> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c >> >> index 8435bc15d72..a1a3f9cb583 100644 >> >> --- a/gcc/config/rs6000/rs6000.c >> >> +++ b/gcc/config/rs6000/rs6000.c >> >> @@ -1433,6 +1433,9 @@ static const struct attribute_spec >> >> rs6000_attribute_table[] = >> >> #undef TARGET_LOOP_UNROLL_ADJUST >> >> #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust >> >> >> >> +#undef TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL >> >> +#define TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL >> >> rs6000_loop_allow_unroll_completely_peel >> >> + >> >> #undef TARGET_INIT_BUILTINS >> >> #define TARGET_INIT_BUILTINS rs6000_init_builtins >> >> #undef TARGET_BUILTIN_DECL >> >> @@ -5139,6 +5142,22 @@ rs6000_loop_unroll_adjust (unsigned nunroll, >> >> struct loop *loop) >> >> return nunroll; >> >> } >> >> >> >> +/* Implement targetm.loop_allow_unroll_completely_peel. */ >> >> + >> >> +static bool >> >> +rs6000_loop_allow_unroll_completely_peel (HOST_WIDE_INT maxiter, tree >> >> niter, >> >> + struct loop *loop) >> >> +{ >> >> + if (unroll_only_small_loops && optimize == 2) >> >> + { >> >> + if (maxiter >= 4 >> >> + && !(TREE_CODE (niter) == INTEGER_CST && single_exit (loop))) >> >> + return false; >> >> + } >> >> + >> >> + return true; >> >> +} >> >> + >> >> /* Handler for the Mathematical Acceleration Subsystem (mass) interface >> >> to a >> >> library with vectorized intrinsics. */ >> >> >> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi >> >> index 6e7d9dc54a9..f7419872c2f 100644 >> >> --- a/gcc/doc/tm.texi >> >> +++ b/gcc/doc/tm.texi >> >> @@ -11893,6 +11893,15 @@ is required only when the target has special >> >> constraints like maximum >> >> number of memory accesses. >> >> @end deftypefn >> >> >> >> +@deftypefn {Target Hook} bool TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL >> >> (HOST_WIDE_INT @var{maxiter}, tree @var{niter}, class loop *@var{loop}) >> >> +This target hook check if the loop @var{loop} is suitable to unroll >> >> +completely on the target. The parameter @var{maxiter} is the possible max >> >> +bound of iterations. The parameter @var{niter} is the number expression >> >> of >> >> +iterations the loop is executed. The parameter @var{loop} is a pointer to >> >> +the loop. This target hook is required only when the target has special >> >> +constraints. >> >> +@end deftypefn >> >> + >> >> @defmac POWI_MAX_MULTS >> >> If defined, this macro is interpreted as a signed integer C expression >> >> that specifies the maximum number of floating point multiplications >> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in >> >> index 3be984bbd5c..4ae079a650c 100644 >> >> --- a/gcc/doc/tm.texi.in >> >> +++ b/gcc/doc/tm.texi.in >> >> @@ -8010,6 +8010,8 @@ lists. >> >> >> >> @hook TARGET_LOOP_UNROLL_ADJUST >> >> >> >> +@hook TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL >> >> + >> >> @defmac POWI_MAX_MULTS >> >> If defined, this macro is interpreted as a signed integer C expression >> >> that specifies the maximum number of floating point multiplications >> >> diff --git a/gcc/target.def b/gcc/target.def >> >> index 07059a87caf..3842ba611a2 100644 >> >> --- a/gcc/target.def >> >> +++ b/gcc/target.def >> >> @@ -2709,6 +2709,18 @@ number of memory accesses.", >> >> unsigned, (unsigned nunroll, class loop *loop), >> >> NULL) >> >> >> >> +/* Return if a loop is suitable for complete unroll. */ >> >> +DEFHOOK >> >> +(loop_allow_unroll_completely_peel, >> >> + "This target hook check if the loop @var{loop} is suitable to unroll\n\ >> >> +completely on the target. The parameter @var{maxiter} is the possible >> >> max\n\ >> >> +bound of iterations. The parameter @var{niter} is the number expression >> >> of\n\ >> >> +iterations the loop is executed. The parameter @var{loop} is a pointer >> >> to\n\ >> >> +the loop. This target hook is required only when the target has >> >> special\n\ >> >> +constraints.", >> >> + bool, (HOST_WIDE_INT maxiter, tree niter, class loop *loop), >> >> + NULL) >> >> + >> >> /* True if X is a legitimate MODE-mode immediate operand. */ >> >> DEFHOOK >> >> (legitimate_constant_p, >> >> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c >> >> index 8ab6ab3330c..52d45a90a56 100644 >> >> --- a/gcc/tree-ssa-loop-ivcanon.c >> >> +++ b/gcc/tree-ssa-loop-ivcanon.c >> >> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see >> >> #include "system.h" >> >> #include "coretypes.h" >> >> #include "backend.h" >> >> +#include "target.h" >> >> #include "tree.h" >> >> #include "gimple.h" >> >> #include "cfghooks.h" >> >> @@ -1231,6 +1232,10 @@ canonicalize_loop_induction_variables (class loop >> >> *loop, >> >> /* Force re-computation of loop bounds so we can remove redundant >> >> exits. */ >> >> maxiter = max_loop_iterations_int (loop); >> >> >> >> + if (allow_peel && targetm.loop_allow_unroll_completely_peel >> >> + && !targetm.loop_allow_unroll_completely_peel (maxiter, niter, >> >> loop)) >> >> + return false; >> >> + >> >> if (dump_file && (dump_flags & TDF_DETAILS) >> >> && TREE_CODE (niter) == INTEGER_CST) >> >> { >> >> -- >> >> 2.17.1 >> >>