On Tue, Dec 10, 2024 at 11:04 AM Feng Xue OS
<f...@os.amperecomputing.com> wrote:
>
> Thanks, and please see my comments as below:
>
> >> Currently, if could, scev-cprop unconditionally replaces loop closed ssa 
> >> with
> >> an expression built from loop initial value and loop niter, which might 
> >> cause
> >> redundant code-gen when all interior computations related to IV inside loop
> >> are also neccessary. As example, for the below case:
> >>
> >>     p = init_addr;
> >>
> >>     for (i = 0; i < N; i++)
> >>       {
> >>         p++;
> >>         *p = ...;
> >>       }
> >>
> >>     . = p;
> >>
> >> Then scev-cprop would end up with code:
> >>
> >>     p = init_addr;
> >>
> >>     for (i = 0; i < N; i++)
> >>       {
> >>         p++;
> >>         *p = ...;
> >>       }
> >>
> >>     . = init_addr + N; // Redundant computation
> >>
> >> For bitmask-manipulation loop, it may result in more and costy 
> >> re-evaluation,
> >> such as popcount. To target the issue, we need a means as statement 
> >> necessity
> >> propagation used in DCE, to figure out if impacted IVs are really needed. 
> >> As
> >> pointed out by Richard, we could wire scev-cprop into DCE, here this patch
> >> makes the thing.
> >>
> >> But one difference is that we consider retaining scev-cprop pass, and 
> >> extends
> >> its opt flag to support both this new (-ftree-scev-cprop[=1] by default) 
> >> and
> >> the original (-ftree-scev-cprop=2). In reality, I think the new way could
> >> get us more compact and faster code at most occasions, however, it is 
> >> possible
> >> the original handling might be better, because replacement could impact 
> >> folding
> >> of statements following loop, for example,
> >>
> >>     p = init_addr;
> >>
> >>     for (i = 0; i < N; i++)
> >>       {
> >>         p++;
> >>         *p = ...;
> >>       }
> >>
> >>     p1 = p;
> >>
> >>     ...
> >>
> >>     a = p1 - init_addr;  // a = (init_addr + N) - init_addr = N
> >>     b = p1 - N;          // b = (init_addr + N) - N = init_addr
> >>
> >> It is hard to take this into cost-model consideration, in that global-wide
> >> check on folding opportunities might be time-consuming, and the above case 
> >> is
> >> not that common. Therefore, as a backup, we leave the original means still
> >> there, so that give user an ability to enable it when some case matches 
> >> with
> >> the scenario.
> >
> > Thanks for working on this.  I don't think we want to preserve the "old" way
> > in a conditional way.  There's also already pass_cd_dce two passes after
> > the old scev_cprop, so adding another effective DCE pass looks excessive to 
> > me.
> >
> > As Honza said in another PR (wrt loop split), passes might invoke final 
> > value
> > replacement on select IVs themselves.
> >
> >> Thanks,
> >> Feng
> >> ---
> >> gcc/
> >>         PR tree-optimization/90594
> >>         * common.opt (ftree-scev-cprop=): New option.
> >>         (ftree-scev-cprop): Change it to be alias of ftree-scev-cprop=.
> >
> > So this would become a gate on the DCE functionality.
>
> OK. And I think we need one more control on triggering times of scev-cprop in
> DCE, as I could image, there are three schemes:
>
>    1. just only once, as originally. We need to add another pass parameter 
> for DCE,
>        and explicitly specifies it in some pass_dce or pass_cd_dce.

Yes.  I'd only ever do it for cd_dce since only that pass would be
able to elide a loop
completely.

>    2. only enabled when loop structure is initialized. We could do that by 
> relying on
>        existing check of "scev_initialized_p"

If you do it from the CD-DCE instance next to SCCP currently that only
gets run when
loops are (were) there.

>    3. every time when DCE is called. This would lead to compile-time increase 
> since
>        as a basic utility pass, DCE is called quite a few times.
>
> >
> > Some more comments inline - note I think this should now wait for next
> > stage1,
>
> stage1 of gcc-16?

Yes.

> > but the refactored API could be used for example to solve the
>
> It is OK to check-in the first patch in this release?

Yes.

> > loop-split issue.
> >
> >>         * tree-scalar-evolution.cc (simple_scev_final_value): Make it be
> >>         global function.
> >>         (apply_scev_final_value_replacement): Likewise.
> >>         * tree-scalar-evolution.h (scev_const_prop): Remove declaration.
> >>         (simple_scev_final_value): Add new declaration.
> >>         (apply_scev_final_value_replacement): Likewise.
> >>         * tree-ssa-dce.cc (stmt_stats): Add new field sccp_replaced_phis.
> >>         (scev_cprop_entry): New struct.
> >>         (scev_cprop_level): New static variable.
> >>         (scev_cprop_map): Likewise.
> >>         (mark_expr_operand_necessary): New function.
> >>         (get_loop_closed_phi_scev_replacement): Likewise.
> >>         (propagate_necessity): Change neccssity propagation for loop closed
> >>         phi when scev-cpropr is enabled.
> >>         (fold_scev_cprop_entry): New function.
> >>         (remove_dead_phis): Rename to replace_or_remove_phis. And do scev
> >>         final value replacement for loop closed phi.
> >>         (eliminate_unnecessary_stmts): Changed to call 
> >> replace_or_remove_phis.
> >>         (print_stats): Print stats for replaced phi.
> >>         (tree_dce_init): Initialize scev_cprop_map.
> >>         (tree_dce_done): Delete scev_cprop_map.
> >>         (perform_tree_ssa_dce): Make it be global function. Add scev-cprop
> >>         specific handling.
> >>         * tree-ssa-dce.h (perform_tree_ssa_dce): Add new declaration.
> >>         * tree-ssa-loop.cc (pass_scev_cprop::execute): Changed to call
> >>         perform_tree_ssa_dce.
> >> ---
> >>  gcc/common.opt               |   9 +-
> >>  gcc/tree-scalar-evolution.cc |   6 +-
> >>  gcc/tree-scalar-evolution.h  |   3 +-
> >>  gcc/tree-ssa-dce.cc          | 251 ++++++++++++++++++++++++++++++++---
> >>  gcc/tree-ssa-dce.h           |   1 +
> >>  gcc/tree-ssa-loop.cc         |   8 +-
> >>  6 files changed, 249 insertions(+), 29 deletions(-)
> >>
> >> diff --git a/gcc/common.opt b/gcc/common.opt
> >> index a42537c5f1e..98210ed72fd 100644
> >> --- a/gcc/common.opt
> >> +++ b/gcc/common.opt
> >> @@ -3425,8 +3425,15 @@ ftree-vect-loop-version
> >>  Common Ignore
> >>  Does nothing. Preserved for backward compatibility.
> >>
> >> +; If this option is 1, only perform scev cprop when all statements to 
> >> evaluate
> >> +; related IV inside loop could be eliminated, if it is 2, perform scev 
> >> cprop
> >> +; unconditionally.
> >> +ftree-scev-cprop=
> >> +Common Joined RejectNegative UInteger Var(flag_tree_scev_cprop) Init(1) 
> >> Optimization IntegerRange(0, 2)
> >> +Enable copy propagation of scalar-evolution information.
> >> +
> >>  ftree-scev-cprop
> >> -Common Var(flag_tree_scev_cprop) Init(1) Optimization
> >> +Common Alias(ftree-scev-cprop=,1,0)
> >>  Enable copy propagation of scalar-evolution information.
> >>
> >>  ftrivial-auto-var-init=
> >> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> >> index 5733632aa78..9e51b18b237 100644
> >> --- a/gcc/tree-scalar-evolution.cc
> >> +++ b/gcc/tree-scalar-evolution.cc
> >> @@ -3381,7 +3381,7 @@ scev_finalize (void)
> >>  }
> >>
> >>  /* Returns true if the expression EXPR is considered to be too expensive
> >> -   for scev_const_prop.  Sets *COND_OVERFLOW_P to true when the
> >> +   for scev const prop.  Sets *COND_OVERFLOW_P to true when the
> >>     expression might contain a sub-expression that is subject to undefined
> >>     overflow behavior and conditionally evaluated.  */
> >>
> >> @@ -3782,7 +3782,7 @@ analyze_and_compute_bitop_with_inv_effect (class 
> >> loop* loop, tree phidef,
> >>     final value to avoid overflow UB when replacement would really happen
> >>     later.  */
> >>
> >> -static bool
> >> +bool
> >>  simple_scev_final_value (class loop *loop, tree value, tree *final_value,
> >>                          bool *rewrite_overflow)
> >
> > It might be useful to externalize the cost decision - for example if
> > we can elide
> > the full loop we might be more forgiving.  Or if analysis merely wants to 
> > use
> > the final value for analysis purposes rather than emit it, it might
> > not care for costs
> > at all.
> >
> >>  {
> >> @@ -3877,7 +3877,7 @@ simple_scev_final_value (class loop *loop, tree 
> >> value, tree *final_value,
> >>     have to leave an unused copy assignment around, if so, ALWAYS_KEEP is 
> >> set
> >>     to true.  */
> >>
> >> -static void
> >> +void
> >>  apply_scev_final_value_replacement (gphi *phi, tree final_value,
> >>                                     bool rewrite_overflow, bool 
> >> always_keep)
> >>  {
> >> diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
> >> index 41ba6b237b5..4510e0c52e9 100644
> >> --- a/gcc/tree-scalar-evolution.h
> >> +++ b/gcc/tree-scalar-evolution.h
> >> @@ -35,7 +35,8 @@ extern tree instantiate_scev (edge, class loop *, tree);
> >>  extern tree resolve_mixers (class loop *, tree, bool *);
> >>  extern void gather_stats_on_scev_database (void);
> >>  extern bool final_value_replacement_loop (class loop *);
> >> -extern unsigned int scev_const_prop (void);
> >> +extern bool simple_scev_final_value (class loop *, tree, tree *, bool *);
> >> +extern void apply_scev_final_value_replacement (gphi *, tree, bool, bool);
> >>  extern bool expression_expensive_p (tree, bool *);
> >>  extern bool simple_iv_with_niters (class loop *, class loop *, tree,
> >>                                    struct affine_iv *, tree *, bool);
> >> diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
> >> index ad3ac2785cf..5a667505dfb 100644
> >> --- a/gcc/tree-ssa-dce.cc
> >> +++ b/gcc/tree-ssa-dce.cc
> >> @@ -59,8 +59,10 @@ along with GCC; see the file COPYING3.  If not see
> >>  #include "tree-eh.h"
> >>  #include "gimplify.h"
> >>  #include "gimple-iterator.h"
> >> +#include "gimplify-me.h"
> >>  #include "tree-cfg.h"
> >>  #include "tree-ssa-loop-niter.h"
> >> +#include "tree-ssa-loop-manip.h"
> >>  #include "tree-into-ssa.h"
> >>  #include "tree-dfa.h"
> >>  #include "cfgloop.h"
> >> @@ -77,8 +79,16 @@ static struct stmt_stats
> >>    int total_phis;
> >>    int removed;
> >>    int removed_phis;
> >> +  int sccp_replaced_phis;
> >>  } stats;
> >>
> >> +struct scev_cprop_entry
> >> +{
> >> +  tree value;
> >> +  bool rewrite_overflow;
> >> +  bool visited;
> >> +};
> >> +
> >>  #define STMT_NECESSARY GF_PLF_1
> >>
> >>  static vec<gimple *> worklist;
> >> @@ -94,6 +104,14 @@ static sbitmap last_stmt_necessary;
> >>  /* Vector indicating that BB contains statements that are live.  */
> >>  static sbitmap bb_contains_live_stmts;
> >>
> >> +/* Control level of scev cprop.  */
> >> +static int scev_cprop_level;
> >> +
> >> +/* For a loop closed PHI, if the induction variable at loop exit could be
> >> +   calculated using initial value and loop niter, we would add a map 
> >> between
> >> +   definition of the PHI and the induction final value.  */
> >> +static hash_map<tree, scev_cprop_entry> *scev_cprop_map;
> >> +
> >>  /* Before we can determine whether a control branch is dead, we need to
> >>     compute which blocks are control dependent on which edges.
> >>
> >> @@ -241,6 +259,18 @@ mark_operand_necessary (tree op)
> >>    worklist.safe_push (stmt);
> >>  }
> >>
> >> +/* Mark operand stored in TP as necessary if it is a ssa name.  */
> >> +
> >> +tree
> >> +mark_expr_operand_necessary (tree *tp, int *walk_subtrees 
> >> ATTRIBUTE_UNUSED,
> >> +                             void *data ATTRIBUTE_UNUSED)
> >> +{
> >> +  if (TREE_CODE (*tp) == SSA_NAME)
> >> +    mark_operand_necessary (*tp);
> >> +
> >> +  return NULL_TREE;
> >> +}
> >> +
> >>  /* Return true if STMT is a call to allocation function that can be
> >>     optimized out if the memory block is never used for anything else
> >>     then NULL pointer check or free.
> >> @@ -765,6 +795,49 @@ valid_new_delete_pair_p (gimple *new_call, gimple 
> >> *delete_call)
> >>    return valid_new_delete_pair_p (new_asm, delete_asm);
> >>  }
> >>
> >> +/* Given a PHI, check if it is a loop closed PHI, and related induction
> >> +   variable has a simple final value that could be directly calculated
> >> +   with its initial value and loop niter.  If satisfied, insert a new
> >> +   entry to describe this when ADD_NEW is true, or return the existing
> >> +   matched entry when ADD_NEW is false, otherwise, return NULL.  */
> >> +
> >> +static scev_cprop_entry *
> >> +get_loop_closed_phi_scev_replacement (gphi *phi, bool add_new)
> >> +{
> >> +  /* Do nothing if scep prop is not enabled or this is definitely
> >> +     not a loop closed PHI.  */
> >> +  if (scev_cprop_level < 1 || gimple_phi_num_args (phi) != 1)
> >
> > Note LC PHIs can have multiple preds - but indeed it gets difficult
>
> Do you mean that a LC PHI may contain def that is not in a loop, as
>
>    v = PHI (v1<In loop>, v2<not in loop>);

Yes, or two defs from two different loops.

> > here so I'd only change the comment.  All PHIs on loop exits are
> > LC PHI nodes [if their arg is defined in the loop].
> >
> > So ...
> >
> >> +    return NULL;
> >> +
> >> +  tree result = gimple_phi_result (phi);
> >> +
> >> +  if (!add_new)
> >> +    return scev_cprop_map->get (result);
> >> +
> >> +  edge exit = single_pred_edge (gimple_bb (phi));
> >> +  tree arg = gimple_phi_arg_def (phi, 0);
> >> +
> >> +  if (TREE_CODE (arg) != SSA_NAME)
> >> +    return NULL;
> >> +
> >> +  class loop *loop = exit->src->loop_father;
> >> +  scev_cprop_entry entry = {};
> >> +  bool existed;
> >> +
> >> +  if (!simple_scev_final_value (loop, arg, &entry.value,
> >> +                               &entry.rewrite_overflow))
> >> +    return NULL;
> >> +
> >> +  auto &new_entry = scev_cprop_map->get_or_insert (result, &existed);
> >> +
> >> +  /* Based on the way of DCE propagation, an expression would not be 
> >> handled
> >> +     more than once.  */
> >> +  gcc_assert (!existed);
> >> +  new_entry = entry;
> >> +
> >> +  return &new_entry;
> >> +}
> >> +
> >>  /* Propagate necessity using the operands of necessary statements.
> >>     Process the uses on each statement in the worklist, and add all
> >>     feeding statements which contribute to the calculation of this
> >> @@ -817,12 +890,33 @@ propagate_necessity (bool aggressive)
> >>           gphi *phi = as_a <gphi *> (stmt);
> >>           size_t k;
> >>
> >> -         for (k = 0; k < gimple_phi_num_args (stmt); k++)
> >> -            {
> >> -             tree arg = PHI_ARG_DEF (stmt, k);
> >> -             if (TREE_CODE (arg) == SSA_NAME)
> >> -               mark_operand_necessary (arg);
> >> -           }
> >> +         /* When scev cprop is enabled, computation of induction variable
> >> +            might not be really needed, if its final value at loop exit 
> >> could
> >> +            be directly calculated using initial value and loop niter, and
> >> +            any interior evaluation around the induction variable does not
> >> +            participate in other necessary statements in the loop.  So we 
> >> only
> >> +            propagate necessity through ssa operands in the final value.
> >> +            An example:
> >> +
> >> +            v = init;
> >> +
> >> +            for (i = 0; i < N; i++)
> >> +              v += step;
> >> +
> >> +            . = v;
> >> +
> >> +            The loop could be completely removed, and replaced with a 
> >> simple
> >> +            evaluation as: v = init + step * N, therefore, only "init", 
> >> "step"
> >> +            and "N" are actually necessary.  */
> >> +         if (auto *entry = get_loop_closed_phi_scev_replacement (phi, 
> >> true))
> >> +           walk_tree (&entry->value, mark_expr_operand_necessary, NULL, 
> >> NULL);
> >> +         else
> >> +           for (k = 0; k < gimple_phi_num_args (stmt); k++)
> >> +             {
> >
> > ... I'd say you want to do get_loop_closed_phi_scev_replacement iff
> > PHI_ARG_EDGE is a loop exit edge only.  Having not all edges
> > loop exits or being replaceable might turn out not easily supported
> > (you'd need to split the edge, re-creating unrelated LC PHIs on the
> > original loop exit), so only doing it for single_pred blocks might make
> > sense for simplicity.
>
> In get_loop_closed_phi_scev_replacement, I requires gimple_phi_num_args(phi) 
> must be 1,
> which is equivalent to single_pred block.

Yes, that's because "code generation" only can deal with that case
since it emits code
in the block of the PHI rather than on the exit edge.

I still think that get_loop_closed_phi_scev_replacement should be
engineered in a less
restrictive way and the additional limitation imposed by the caller.

> >
> >> +               tree arg = PHI_ARG_DEF (stmt, k);
> >> +               if (TREE_CODE (arg) == SSA_NAME)
> >> +                 mark_operand_necessary (arg);
> >> +             }
> >>
> >>           /* For PHI operands it matters from where the control flow 
> >> arrives
> >>              to the BB.  Consider the following example:
> >> @@ -1104,10 +1198,80 @@ propagate_necessity (bool aggressive)
> >>      }
> >>  }
> >>
> >> -/* Remove dead PHI nodes from block BB.  */
> >> +/* Try to fold the final value of ENTRY to a constant or ssa name.  Since
> >> +   one entry may refer ssa name that is described by other entry, so this
> >> +   function would recursively process folding along the def/use relation.
> >> +   If the processed value does not belong to any scev_cprop entry, it is
> >> +   stored in VALUE_PTR, and ENTRY is set to NULL.  Return true if the 
> >> value
> >> +   does be simplified.  */
> >>
> >>  static bool
> >> -remove_dead_phis (basic_block bb)
> >> +fold_scev_cprop_entry (scev_cprop_entry *entry, tree *value_ptr = NULL)
> >> +{
> >> +  if (entry)
> >> +    {
> >> +      if (entry->visited || CONSTANT_CLASS_P (entry->value))
> >> +       return true;
> >> +
> >
> > Hmm, how does this work?  You are using pointer equality here so
> > rely on the SCEV final value of two PHIs that are dependent on each
> > other to have pointer-equivalent sub-expressions?  The final value
> > of a PHI does obviously not refer to the PHI result of another LC PHI
> > here?
>
>  See this case:
>
>     int foo()
>     {
>        int i, j, m, n;
>
>        m = 0;
>        for (i = 0; i < 1000; i++)
>           m += 3;
>
>        n = 0;
>        for (j = 0; j < 2000; j++)
>           n += m;
>
>        return n;
>     }
>
>   The final value of LC PHI that defines "n" would refer to result of LC PHI 
> that
>   defines "m".
>
>   When this function tries to fold the final value of "n", it will 
> recursively fold
>   that of "m", finally we could get a constant for "n".

Ah, I see.  So this requires more comments I think.  I thought this was for

  for (j = 0; j < m; j++)
     n = j * 200;
  return n + j;

where the final value of n is 200 * m and the final value of j is m
(now imagine more complex 'm'), and the re-use be used to make
the replacement cheaper when you need to preserve multiple
related final values.

> >> +      entry->visited = true;
> >> +      entry->value = unshare_expr (entry->value);
> >> +      value_ptr = &entry->value;
> >> +    }
> >> +
> >> +  tree value = *value_ptr;
> >> +
> >> +  if (TREE_CODE (value) == SSA_NAME)
> >> +    {
> >> +      scev_cprop_entry *def_entry;
> >> +
> >> +      if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (value)
> >> +         && (def_entry = scev_cprop_map->get (value)))
> >> +       {
> >> +         fold_scev_cprop_entry (def_entry);
> >> +
> >> +         if (CONSTANT_CLASS_P (def_entry->value) ||
> >
> > ||s go to the next line.
> >
> >> +             TREE_CODE (def_entry->value) == SSA_NAME)
> >> +          {
> >> +            *value_ptr = def_entry->value;
> >> +            return true;
> >> +          }
> >> +       }
> >> +
> >> +      return false;
> >> +    }
> >> +
> >> +  bool changed = false;
> >> +
> >> +  if (EXPR_P (value))
> >> +    {
> >> +      for (int i = 0; i < TREE_OPERAND_LENGTH (value); i++)
> >> +       {
> >> +         tree &opnd = TREE_OPERAND (value, i);
> >> +
> >> +         if (opnd && !CONSTANT_CLASS_P (opnd))
> >> +           changed |= fold_scev_cprop_entry (NULL, &opnd);
> >> +       }
> >> +    }
> >> +
> >> +  if (changed)
> >> +    {
> >> +      /* Only fold the value if any of its operand has been folded.  */
> >> +      tree new_value = fold (value);
> >> +
> >> +      if (new_value == value)
> >> +       changed = false;
> >> +      else
> >> +       *value_ptr = new_value;
> >> +    }
> >> +
> >> +  return changed;
> >> +}
> >> +
> >> +/* Remove dead PHI nodes from block BB, and try to replace loop closed 
> >> PHIs
> >> +   with scev final values.  */
> >> +
> >> +static bool
> >> +replace_or_remove_phis (basic_block bb)
> >>  {
> >>    bool something_changed = false;
> >>    gphi *phi;
> >> @@ -1158,6 +1322,38 @@ remove_dead_phis (basic_block bb)
> >>           stats.removed_phis++;
> >>           continue;
> >>         }
> >> +      else if (auto *entry = get_loop_closed_phi_scev_replacement (phi, 
> >> false))
> >> +       {
> >> +         tree arg = gimple_phi_arg_def (phi, 0);
> >> +
> >> +         fold_scev_cprop_entry (entry);
> >> +
> >> +         if (scev_cprop_level > 1
> >> +             || !bitmap_bit_p (processed, SSA_NAME_VERSION (arg))
> >> +             || CONSTANT_CLASS_P (entry->value)
> >> +             || TREE_CODE (entry->value) == SSA_NAME)
> >> +           {
> >> +             something_changed = true;
> >> +             stats.sccp_replaced_phis++;
> >> +             if (dump_file && (dump_flags & TDF_DETAILS))
> >> +               {
> >> +                 fprintf (dump_file, "Replacing : ");
> >> +                 print_gimple_stmt (dump_file, phi, 0, TDF_SLIM);
> >> +                 fprintf (dump_file, "  with    : ");
> >> +                 print_generic_expr (dump_file, gimple_phi_result (phi));
> >> +                 fprintf (dump_file, " = ");
> >> +                 print_generic_expr (dump_file, entry->value);
> >> +                 fprintf (dump_file, ";\n\n");
> >> +               }
> >> +
> >> +             /* Advance iterator before the PHI is removed.  */
> >> +             gsi_next (&gsi);
> >> +             apply_scev_final_value_replacement (phi, entry->value,
> >> +                                                 entry->rewrite_overflow,
> >> +                                                 false);
> >> +             continue;
> >> +           }
> >> +       }
> >>
> >>        gsi_next (&gsi);
> >>      }
> >> @@ -1611,7 +1807,7 @@ eliminate_unnecessary_stmts (bool aggressive)
> >>         }
> >>
> >>        /* Remove dead PHI nodes.  */
> >> -      something_changed |= remove_dead_phis (bb);
> >> +      something_changed |= replace_or_remove_phis (bb);
> >>      }
> >>
> >>    /* First remove queued edges.  */
> >> @@ -1755,6 +1951,11 @@ print_stats (void)
> >>
> >>    fprintf (dump_file, "Removed %d of %d PHI nodes (%d%%)\n",
> >>            stats.removed_phis, stats.total_phis, (int) percg);
> >> +
> >> +  if (stats.sccp_replaced_phis)
> >> +    fprintf (dump_file, "Replaced %d PHI node%s by SCCP\n",
> >> +            stats.sccp_replaced_phis,
> >> +            stats.sccp_replaced_phis > 1 ? "s" : "");
> >>  }
> >>
> >>  /* Initialization for this pass.  Set up the used data structures.  */
> >> @@ -1775,6 +1976,11 @@ tree_dce_init (bool aggressive)
> >>    processed = sbitmap_alloc (num_ssa_names + 1);
> >>    bitmap_clear (processed);
> >>
> >> +  if (scev_cprop_level > 0)
> >> +    scev_cprop_map = new hash_map<tree, scev_cprop_entry>();
> >> +  else
> >> +    scev_cprop_map = NULL;
> >> +
> >>    worklist.create (64);
> >>    cfg_altered = false;
> >>  }
> >> @@ -1795,6 +2001,9 @@ tree_dce_done (bool aggressive)
> >>
> >>    sbitmap_free (processed);
> >>
> >> +  if (scev_cprop_map)
> >> +    delete scev_cprop_map;
> >> +
> >>    worklist.release ();
> >>  }
> >>
> >> @@ -2005,23 +2214,29 @@ make_forwarders_with_degenerate_phis (function *fn)
> >>     In aggressive mode, control dependences are taken into account, which
> >>     results in more dead code elimination, but at the cost of some time.  
> >> */
> >>
> >> -static unsigned int
> >> -perform_tree_ssa_dce (bool aggressive)
> >> +unsigned int
> >> +perform_tree_ssa_dce (bool aggressive, int scev_cprop = 0)
> >>  {
> >>    bool something_changed = 0;
> >>    unsigned todo = 0;
> >> +  bool need_loop = aggressive || scev_cprop > 0;
> >>
> >>    /* Preheaders are needed for SCEV to work.
> >>       Simple lateches and recorded exits improve chances that loop will
> >>       proved to be finite in testcases such as in loop-15.c and loop-24.c  
> >> */
> >>    bool in_loop_pipeline = scev_initialized_p ();
> >> -  if (aggressive && ! in_loop_pipeline)
> >> +  if (need_loop && ! in_loop_pipeline)
> >>      {
> >>        loop_optimizer_init (LOOPS_NORMAL
> >>                            | LOOPS_HAVE_RECORDED_EXITS);
> >
> > add LOOP_CLOSED_SSA here iff scev_cprop, in_loop_pipeline ensures
> > we're in LC SSA.
> >
>
> Is it possible that in_loop_pipeline is true while LOOP_CLOSED_SSA form of
> a loop is broken? If this happen, it is a bug?

Yes, currently we only have SCEV initialized during the loop pipeline and
there LOOP_CLOSED_SSA is active and upon entry and exit to a pass
in the loop pipeline the form should be valid.

> >>        scev_initialize ();
> >>      }
> >>
> >> +  scev_cprop_level = scev_cprop;
> >> +
> >> +  if (scev_cprop > 0 && !loops_state_satisfies_p (LOOP_CLOSED_SSA))
> >> +    rewrite_into_loop_closed_ssa (NULL, 0);
> >> +
> >>    if (aggressive)
> >>      todo |= make_forwarders_with_degenerate_phis (cfun);
> >>
> >> @@ -2044,12 +2259,6 @@ perform_tree_ssa_dce (bool aggressive)
> >>
> >>    find_obviously_necessary_stmts (aggressive);
> >>
> >> -  if (aggressive && ! in_loop_pipeline)
> >> -    {
> >> -      scev_finalize ();
> >> -      loop_optimizer_finalize ();
> >> -    }
> >> -
> >>    longest_chain = 0;
> >>    total_chain = 0;
> >>    nr_walks = 0;
> >> @@ -2058,6 +2267,12 @@ perform_tree_ssa_dce (bool aggressive)
> >>    propagate_necessity (aggressive);
> >>    BITMAP_FREE (visited);
> >>
> >> +  if (need_loop && ! in_loop_pipeline)
> >> +    {
> >> +      scev_finalize ();
> >> +      loop_optimizer_finalize ();
> >> +    }
> >> +
> >>    something_changed |= eliminate_unnecessary_stmts (aggressive);
> >>    something_changed |= cfg_altered;
> >>
> >> diff --git a/gcc/tree-ssa-dce.h b/gcc/tree-ssa-dce.h
> >> index b0e92a58ea8..ef5b77ce36a 100644
> >> --- a/gcc/tree-ssa-dce.h
> >> +++ b/gcc/tree-ssa-dce.h
> >> @@ -18,5 +18,6 @@ along with GCC; see the file COPYING3.  If not see
> >>
> >>  #ifndef TREE_SSA_DCE_H
> >>  #define TREE_SSA_DCE_H
> >> +extern unsigned perform_tree_ssa_dce (bool, int = 0);
> >>  extern void simple_dce_from_worklist (bitmap, bitmap = nullptr);
> >>  #endif
> >> diff --git a/gcc/tree-ssa-loop.cc b/gcc/tree-ssa-loop.cc
> >> index 1d9afd98015..04fd58c0dc5 100644
> >> --- a/gcc/tree-ssa-loop.cc
> >> +++ b/gcc/tree-ssa-loop.cc
> >> @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
> >>  #include "tree-ssa-loop-manip.h"
> >>  #include "tree-ssa-loop-niter.h"
> >>  #include "tree-ssa-loop.h"
> >> +#include "tree-ssa-dce.h"
> >>  #include "cfgloop.h"
> >>  #include "tree-inline.h"
> >>  #include "tree-scalar-evolution.h"
> >> @@ -402,14 +403,9 @@ public:
> >>  unsigned
> >>  pass_scev_cprop::execute (function *)
> >>  {
> >> -  bool any = false;
> >> -
> >>    /* Perform final value replacement in loops, in case the replacement
> >>       expressions are cheap.  */
> >> -  for (auto loop : loops_list (cfun, LI_FROM_INNERMOST))
> >> -    any |= final_value_replacement_loop (loop);
> >> -
> >> -  return any ? TODO_cleanup_cfg | TODO_update_ssa_only_virtuals : 0;
> >> +  return perform_tree_ssa_dce (false, flag_tree_scev_cprop);
> >>  }
> >>
> >>  } // anon namespace
> >> --
> >> 2.17.1
> >

Reply via email to