Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Jiufu Guo via Gcc-patches Wed, 20 May 2020 01:59:20 -0700

Richard Biener <richard.guent...@gmail.com> writes:

> On Wed, May 20, 2020 at 10:27 AM Jiufu Guo <guoji...@linux.ibm.com> wrote:
>>
>> Richard Biener <richard.guent...@gmail.com> writes:
>>
>> > On Wed, May 20, 2020 at 5:56 AM Jiufu Guo via Gcc-patches
>> > <gcc-patches@gcc.gnu.org> wrote:
>> >>
>> >> Hi,
>> >>
>> >> In r10-4525, and r10-4161, loop unroller was enabled for simply loops at 
>> >> -O2.
>> >> At the same time, the GIMPLE cunroll is also enabled, while it is not 
>> >> only for
>> >> simple loops.  This patch introduces a hook to check if a loop is 
>> >> suitable to
>> >> unroll completely.  The hook can be used to check if a loop is simply 
>> >> enough
>> >> for complete unroll, then avoid negative effects like size increasing or
>> >> complex loop unrolling at -O2.
>> >>
>> >> This patch could help to handle the code in PR95018 which contains complex
>> >> loop, and does not cause obvious performance change on SPEC2017.
>> >>
>> >> Bootstrap/regtest pass on powerpc64le. OK for trunk?
>> >
>> > I fear you're just digging the rs6000 specific hole you dug up deeper
>> > with this ...
>>
>> Yes, this is because r10-4525 and r10-4161 adjust RTL unroller hook for
>> rs6000, and they have the effect to enable GIMPLE cunroll at O2 for all
>> loops.  So, I introduce this patch to adjust GIMPLE cunroll similarly.
>
> I think a better approach may be to leave GIMPLE unroll untouched by
> changing the earlier patch to set a new flag_rtl_unroll_loops
> (where -funroll-loops would enable -frtl-unroll-loops) and gate the RTL
> unroller on flag_rtl_unroll_loops which you can enable separately
> on rs6000.
>
> That is, not add even more rs6000 specialities but restrict the earlier
> one to what it was supposed to do - the GIMPLE effects were clearly
> not your goal.


I also lean to enable GIMPLE cunroll for O2 (at that time only for
rs6000), especially for those loops which are simple and not increase
size too much.

Thanks Richard.

>
> Richard.
>
>> Thanks,
>> Jiufu
>>
>> >
>> > Richard.
>> >
>> >> Jiufu
>> >>
>> >> 2020-05-20  Jiufu Guo   <guoji...@cn.ibm.com>
>> >>
>> >>         PR target/95018
>> >>         * target.def (loop_allow_unroll_completely_peel): New hook.
>> >>         * doc/tm.texi (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New 
>> >> hook.
>> >>         * doc/tm.texi.in (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New 
>> >> hook.
>> >>         * config/rs6000/rs6000.c 
>> >> (TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL): New hook.
>> >>         (rs6000_loop_allow_unroll_completely_peel): New hook 
>> >> implementation.
>> >>         * tree-ssa-loop-ivcanon.c 
>> >> (canonicalize_loop_induction_variables): Call
>> >>         hook loop_allow_unroll_completely_peel.
>> >>
>> >>
>> >> ---
>> >>  gcc/config/rs6000/rs6000.c  | 19 +++++++++++++++++++
>> >>  gcc/doc/tm.texi             |  9 +++++++++
>> >>  gcc/doc/tm.texi.in          |  2 ++
>> >>  gcc/target.def              | 12 ++++++++++++
>> >>  gcc/tree-ssa-loop-ivcanon.c |  5 +++++
>> >>  5 files changed, 47 insertions(+)
>> >>
>> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> >> index 8435bc15d72..a1a3f9cb583 100644
>> >> --- a/gcc/config/rs6000/rs6000.c
>> >> +++ b/gcc/config/rs6000/rs6000.c
>> >> @@ -1433,6 +1433,9 @@ static const struct attribute_spec 
>> >> rs6000_attribute_table[] =
>> >>  #undef TARGET_LOOP_UNROLL_ADJUST
>> >>  #define TARGET_LOOP_UNROLL_ADJUST rs6000_loop_unroll_adjust
>> >>
>> >> +#undef TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
>> >> +#define TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
>> >> rs6000_loop_allow_unroll_completely_peel
>> >> +
>> >>  #undef TARGET_INIT_BUILTINS
>> >>  #define TARGET_INIT_BUILTINS rs6000_init_builtins
>> >>  #undef TARGET_BUILTIN_DECL
>> >> @@ -5139,6 +5142,22 @@ rs6000_loop_unroll_adjust (unsigned nunroll, 
>> >> struct loop *loop)
>> >>    return nunroll;
>> >>  }
>> >>
>> >> +/* Implement targetm.loop_allow_unroll_completely_peel.  */
>> >> +
>> >> +static bool
>> >> +rs6000_loop_allow_unroll_completely_peel (HOST_WIDE_INT maxiter, tree 
>> >> niter,
>> >> +                                         struct loop *loop)
>> >> +{
>> >> +  if (unroll_only_small_loops && optimize == 2)
>> >> +    {
>> >> +      if (maxiter >= 4
>> >> +         && !(TREE_CODE (niter) == INTEGER_CST && single_exit (loop)))
>> >> +       return false;
>> >> +    }
>> >> +
>> >> +  return true;
>> >> +}
>> >> +
>> >>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface 
>> >> to a
>> >>     library with vectorized intrinsics.  */
>> >>
>> >> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> >> index 6e7d9dc54a9..f7419872c2f 100644
>> >> --- a/gcc/doc/tm.texi
>> >> +++ b/gcc/doc/tm.texi
>> >> @@ -11893,6 +11893,15 @@ is required only when the target has special 
>> >> constraints like maximum
>> >>  number of memory accesses.
>> >>  @end deftypefn
>> >>
>> >> +@deftypefn {Target Hook} bool TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL 
>> >> (HOST_WIDE_INT @var{maxiter}, tree @var{niter}, class loop *@var{loop})
>> >> +This target hook check if the loop @var{loop} is suitable to unroll
>> >> +completely on the target. The parameter @var{maxiter} is the possible max
>> >> +bound of iterations. The parameter @var{niter} is the number expression 
>> >> of
>> >> +iterations the loop is executed. The parameter @var{loop} is a pointer to
>> >> +the loop. This target hook is required only when the target has special
>> >> +constraints.
>> >> +@end deftypefn
>> >> +
>> >>  @defmac POWI_MAX_MULTS
>> >>  If defined, this macro is interpreted as a signed integer C expression
>> >>  that specifies the maximum number of floating point multiplications
>> >> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> >> index 3be984bbd5c..4ae079a650c 100644
>> >> --- a/gcc/doc/tm.texi.in
>> >> +++ b/gcc/doc/tm.texi.in
>> >> @@ -8010,6 +8010,8 @@ lists.
>> >>
>> >>  @hook TARGET_LOOP_UNROLL_ADJUST
>> >>
>> >> +@hook TARGET_LOOP_ALLOW_UNROLL_COMPLETELY_PEEL
>> >> +
>> >>  @defmac POWI_MAX_MULTS
>> >>  If defined, this macro is interpreted as a signed integer C expression
>> >>  that specifies the maximum number of floating point multiplications
>> >> diff --git a/gcc/target.def b/gcc/target.def
>> >> index 07059a87caf..3842ba611a2 100644
>> >> --- a/gcc/target.def
>> >> +++ b/gcc/target.def
>> >> @@ -2709,6 +2709,18 @@ number of memory accesses.",
>> >>   unsigned, (unsigned nunroll, class loop *loop),
>> >>   NULL)
>> >>
>> >> +/* Return if a loop is suitable for complete unroll.  */
>> >> +DEFHOOK
>> >> +(loop_allow_unroll_completely_peel,
>> >> + "This target hook check if the loop @var{loop} is suitable to unroll\n\
>> >> +completely on the target. The parameter @var{maxiter} is the possible 
>> >> max\n\
>> >> +bound of iterations. The parameter @var{niter} is the number expression 
>> >> of\n\
>> >> +iterations the loop is executed. The parameter @var{loop} is a pointer 
>> >> to\n\
>> >> +the loop. This target hook is required only when the target has 
>> >> special\n\
>> >> +constraints.",
>> >> + bool, (HOST_WIDE_INT maxiter, tree niter, class loop *loop),
>> >> + NULL)
>> >> +
>> >>  /* True if X is a legitimate MODE-mode immediate operand.  */
>> >>  DEFHOOK
>> >>  (legitimate_constant_p,
>> >> diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
>> >> index 8ab6ab3330c..52d45a90a56 100644
>> >> --- a/gcc/tree-ssa-loop-ivcanon.c
>> >> +++ b/gcc/tree-ssa-loop-ivcanon.c
>> >> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
>> >>  #include "system.h"
>> >>  #include "coretypes.h"
>> >>  #include "backend.h"
>> >> +#include "target.h"
>> >>  #include "tree.h"
>> >>  #include "gimple.h"
>> >>  #include "cfghooks.h"
>> >> @@ -1231,6 +1232,10 @@ canonicalize_loop_induction_variables (class loop 
>> >> *loop,
>> >>    /* Force re-computation of loop bounds so we can remove redundant 
>> >> exits.  */
>> >>    maxiter = max_loop_iterations_int (loop);
>> >>
>> >> +  if (allow_peel && targetm.loop_allow_unroll_completely_peel
>> >> +      && !targetm.loop_allow_unroll_completely_peel (maxiter, niter, 
>> >> loop))
>> >> +    return false;
>> >> +
>> >>    if (dump_file && (dump_flags & TDF_DETAILS)
>> >>        && TREE_CODE (niter) == INTEGER_CST)
>> >>      {
>> >> --
>> >> 2.17.1
>> >>

Re: [PATCH 1/2] rs6000: tune cunroll for simple loops at O2

Reply via email to