Re: Enable loop peeling at -O3

Marek Polacek Fri, 27 May 2016 06:51:10 -0700

On Fri, May 27, 2016 at 03:19:29PM +0200, Jan Hubicka wrote:
> Hi,
> this patch enabled -fpeel-loops by default at -O3 and makes it to use likely
> upper bound estimates.  The patch also adds -fpeel-all-loops flag that is
> symmetric to -funroll-all-loops.  Long time ago we used to interpret
> -fpeel-loops this way and blindly peel every loop but this behaviour got lost
> and now we only peel loop we have some evidence for.
> 
> Bootstrapped/regtested x86_64-linux, I am retesting after last minute change
> (adding of the testcase). OK?
> 
> Honza
> 
>       * common.opt (flag_peel_all_loops): New option.
>       * doc/invoke.texi: (-fpeel-loops): Update documentation.
>       (-fpeel-all-loops): Document.
>       * opts.c (default_options): Add OPT_fpeel_loops to -O3+.
>       * toplev.c (process_options): flag_peel_all_loops implies
>       flag_peel_loops.
>       * tree-ssa-lop-ivcanon.c (try_peel_loop): Update comment; handle
>       -fpeel-all-loops, use likely estimates.
> 
>       * gcc.dg/tree-ssa/peel1.c: New testcase.
>       * gcc.dg/tree-ssa/peel2.c: New testcase.
> Index: common.opt
> ===================================================================
> --- common.opt        (revision 236815)
> +++ common.opt        (working copy)
> @@ -1840,6 +1840,10 @@ fpeel-loops
>  Common Report Var(flag_peel_loops) Optimization
>  Perform loop peeling.
>  
> +fpeel-all-loops
> +Common Report Var(flag_peel_all_loops) Optimization
> +Perform loop peeling of all loops.
> +
>  fpeephole
>  Common Report Var(flag_no_peephole,0) Optimization
>  Enable machine specific peephole optimizations.
> Index: doc/invoke.texi
> ===================================================================
> --- doc/invoke.texi   (revision 236815)
> +++ doc/invoke.texi   (working copy)
> @@ -8661,10 +8661,17 @@ the loop is entered.  This usually makes
>  @item -fpeel-loops
>  @opindex fpeel-loops
>  Peels loops for which there is enough information that they do not
> -roll much (from profile feedback).  It also turns on complete loop peeling
> -(i.e.@: complete removal of loops with small constant number of iterations).
> +roll much (from profile feedback or static analysis).  It also turns on
> +complete loop peeling (i.e.@: complete removal of loops with small constant
> +number of iterations).
>  
> -Enabled with @option{-fprofile-use}.
> +Enabled with @option{-O3} and @option{-fprofile-use}.
> +
> +@item -fpeel-all-loops
> +@opindex fpeel-all-loops
> +Peel all loops, even if their number of iterations is uncertain when
> +the loop is entered.  For loops with large number of iterations this leads
> +to wasted code size.
>  
>  @item -fmove-loop-invariants
>  @opindex fmove-loop-invariants
> Index: opts.c
> ===================================================================
> --- opts.c    (revision 236815)
> +++ opts.c    (working copy)
> @@ -535,6 +535,7 @@ static const struct default_options defa
>      { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, 
> VECT_COST_MODEL_DYNAMIC },
>      { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 },
>      { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
> +    { OPT_LEVELS_3_PLUS, OPT_fpeel_loops, NULL, 1 },
>  
>      /* -Ofast adds optimizations to -O3.  */
>      { OPT_LEVELS_FAST, OPT_ffast_math, NULL, 1 },
> Index: testsuite/gcc.dg/tree-ssa/peel1.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/peel1.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/peel1.c (working copy)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-loop-ivcanon" } */


This should probably be -fdump-tree-ivcanon-details.

> +struct foo {int b; int a[3];} foo;
> +void add(struct foo *a,int l)
> +{
> +  int i;
> +  for (i=0;i<l;i++)
> +    a->a[i]++;
> +}
> +/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 
> "ivcanon"} } */
> +/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." 1 "ivcanon"} } */

And here scan-tree-dump-times.  But even with that the testcases don't pass for
me.

> Index: testsuite/gcc.dg/tree-ssa/peel2.c
> ===================================================================
> --- testsuite/gcc.dg/tree-ssa/peel2.c (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/peel2.c (working copy)
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fpeel-all-loops -fdump-tree-loop-ivcanon" } */
> +void add(int *a,int l)
> +{
> +  int i;
> +  for (i=0;i<l;i++)
> +    a[i]++;
> +}
> +/* { dg-final { scan-tree-dump "Loop likely 1 iterates at most 3 times." 1 
> "ivcanon"} } */

How do you determine "3 times"?  Isn't something missing here?

> +/* { dg-final { scan-tree-dump "Peeled loop 1, 4 times." 1 "ivcanon"} } */
> Index: toplev.c
> ===================================================================
> --- toplev.c  (revision 236815)
> +++ toplev.c  (working copy)
> @@ -1294,6 +1294,9 @@ process_options (void)
>    if (flag_unroll_all_loops)
>      flag_unroll_loops = 1;
>  
> +  if (flag_peel_all_loops)
> +    flag_peel_loops = 1;
> +
>    /* web and rename-registers help when run after loop unrolling.  */
>    if (flag_web == AUTODETECT_VALUE)
>      flag_web = flag_unroll_loops || flag_peel_loops;
> Index: tree-ssa-loop-ivcanon.c
> ===================================================================
> --- tree-ssa-loop-ivcanon.c   (revision 236816)
> +++ tree-ssa-loop-ivcanon.c   (working copy)
> @@ -951,7 +951,9 @@ try_peel_loop (struct loop *loop,
>    if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0)
>      return false;
>  
> -  /* Peel only innermost loops.  */
> +  /* Peel only innermost loops.
> +     While the code is perfectly capable of peeling non-innermost loops,
> +     the heuristics would probably need some improvements. */
>    if (loop->inner)
>      {
>        if (dump_file)
> @@ -969,12 +971,16 @@ try_peel_loop (struct loop *loop,
>    /* Check if there is an estimate on the number of iterations.  */
>    npeel = estimated_loop_iterations_int (loop);
>    if (npeel < 0)
> +    npeel = likely_max_loop_iterations_int (loop);
> +  if (npeel < 0 && flag_peel_all_loops)
> +    npeel = PARAM_VALUE (PARAM_MAX_PEEL_TIMES) - 1;
> +  if (npeel < 0)
>      {
>        if (dump_file)
>          fprintf (dump_file, "Not peeling: number of iterations is not "
>                "estimated\n");
>        return false;
>      }
>    if (maxiter >= 0 && maxiter <= npeel)
>      {
>        if (dump_file)

        Marek

Re: Enable loop peeling at -O3

Reply via email to