On Fri, 8 Dec 2023, Filip Kastl wrote:

> > > Hi,
> > > 
> > > this is a patch that I submitted two months ago as an RFC. I added some 
> > > polish
> > > since.
> > > 
> > > It is a new lightweight pass that removes redundant PHI functions and as a
> > > bonus does basic copy propagation. With Jan Hubi?ka we measured that it 
> > > is able
> > > to remove usually more than 5% of all PHI functions when run among early 
> > > passes
> > > (sometimes even 13% or more). Those are mostly PHI functions that would be
> > > later optimized away but with this pass it is possible to remove them 
> > > early
> > > enough so that they don't get streamed when runing LTO (and also 
> > > potentially
> > > inlined at multiple places). It is also able to remove some redundant PHIs
> > > that otherwise would still be present during RTL expansion.
> > > 
> > > Jakub Jel?nek was concerned about debug info coverage so I compiled 
> > > cc1plus
> > > with and without this patch. These are the sizes of .debug_info and
> > > .debug_loclists
> > > 
> > > .debug_info without patch 181694311
> > > .debug_info    with patch 181692320
> > > +0.0011% change
> > > 
> > > .debug_loclists without patch 47934753
> > > .debug_loclists    with patch 47934966
> > > -0.0004% change
> > > 
> > > I wanted to use dwlocstat to compare debug coverages but didn't manage to 
> > > get
> > > the program working on my machine sadly. Hope this suffices. Seems to me 
> > > that
> > > my patch doesn't have a significant impact on debug info.
> > > 
> > > Bootstraped and tested* on x86_64-pc-linux-gnu.
> > > 
> > > * One testcase (pr79691.c) did regress. However that is because the test 
> > > is
> > > dependent on a certain variable not being copy propagated. I will go into 
> > > more
> > > detail about this in a reply to this mail.
> > > 
> > > Ok to commit?
> > 
> > This is a second version of the patch.  In this version, I modified the
> > pr79691.c testcase so that it works as intended with other changes from the
> > patch.
> > 
> > The pr79691.c testcase checks that we get constants from snprintf calls and
> > that they simplify into a single constant.  The testcase doesn't account for
> > the fact that this constant may be further copy propagated which is exactly
> > what happens with this patch applied.
> > 
> > Bootstrapped and tested on x86_64-pc-linux-gnu.
> > 
> > Ok to commit?
> 
> This is the third version of the patch. In this version, I adressed most of
> Richards remarks about the second version. Here is a summary of changes I 
> made:
> 
> - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
> - Use simple_dce_from_worklist to remove propagated statements
> - Use existing replace_uses API instead of reinventing it
>   - This allowed me to get rid of some now redundant cleanup code
> - Encapsulate the SCC finding into a class
> - Rework stmt_may_generate_copy to get rid of redundant checks
> - Add check that PHI doesn't contain two non-SSA-name values to
>   stmt_may_generate_copy
> - Regarding alignment and value ranges in stmt_may_generate_copy: For now use
>   the conservative check that Richard suggested
> - Index array of vertices that SCC discovery uses by SSA name version numbers
>   instead of numbering statements myself.
> 
> 
> I didn't make any changes based on these remarks:
> 
> 1 It might be nice to optimize SCCs of size 1 somehow, not sure how
>   many times these appear - possibly prevent them from even entering
>   the SCC discovery?
> 
> It would be nice. But the only way to do this that I see right now is to first
> propagate SCCs of size 1 and then the rest. This would mean adding a new copy
> propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
> pass relies on having SCCs topogically sorted so this procedure would have to
> implement some topological sort algorithm.
> 
> This could be done. It could save allocating some vec<>s (right now, SCCs of
> size 1 are represented by a vec<> with a single element). But is it worth it 
> to
> optimize the pass this way right now? If possible, I'd like to see that the
> pass works and sort out any problems people encounter with it before I start
> optimizing it.
> 
> 2 Instead of collecting all stmts that may generate a copy at the beginning of
>   the pass into a vec<>, let the SCC discovery check that statements may
>   generate a copy on the fly.
> 
> This would be a big change to the pass, it would require a lot of reworking.
> I'm also not sure if this would help reduce the number of allocated vec<>s 
> that
> much because I'll still want to represent SCCs by vec<>s.
> 
> Again - its possible I'll want to rework the pass in this way in the future 
> but
> I'd like to leave it as it is for now.
> 
> 3 Add a comment saying that the pass is doing optimistic copy propagation
> 
> I don't think the pass works in an optimistic way. It doesn't assume that all
> variables are copies of each other at any point. It instead identifies copy
> statements (or PHI SCCs that act as copy statements) and then for each of 
> these
> it propagates: By that I mean if a copy statement says that _3 is a copy of 
> _2,
> then it replaces all uses of _3 by _2.
> 
> But its possible that I still misinterpret what 'optimistic' means. If
> mentioning that it works in an optimistic way would help readability of the
> pass, I'm happy to add that comment.

I understood it handles copy cycles?  Consider

 _1 = _2;

bb3:
 # _3 = PHI <_1, _4>
 _4 = _3;
 if (...)
   break;
 else
   goto bb3;

It should see that _4 and _3 are a copy of _1 (and thus _2)?  When
processing the cycle comprising the defs of _3 and _4 don't you
then optimistically assume that _1 == _4 when processing the PHI?
I realize this might be "trivial" given the SCCs are constructed
from copies.  But still with say

 _3 = PHI <_1, _4>

 _4 = PHI <_3, _2>


the cycle would have two entries (it's irreducible) and neither
_3 nor _4 are a copy of _1 (or _2) unless _1 is a copy of _2
(or vice versa).


> Richard, is the patch ok as it is now? Or do you think that making changes 1, 
> 2
> or 3 is necessary? Or are there any other issues which should be adressed?

It's OK to not do the changes you skipped.

> Filip
> 
> 
> -- >8 --
> 
> This patch adds the strongly-connected copy propagation (SCCOPY) pass.
> It is a lightweight GIMPLE copy propagation pass that also removes some
> redundant PHI statements. It handles degenerate PHIs, e.g.:
> 
> _5 = PHI <_1>;
> _6 = PHI <_6, _6, _1, _1>;
> _7 = PHI <16, _7>;
> // Replaces occurences of _5 and _6 by _1 and _7 by 16
> 
> It also handles more complicated situations, e.g.:
> 
> _8 = PHI <_9, _10>;
> _9 = PHI <_8, _10>;
> _10 = PHI <_8, _9, _1>;
> // Replaces occurences of _8, _9 and _10 by _1
> 
> gcc/ChangeLog:
> 
>       * Makefile.in: Added sccopy pass.
>       * passes.def: Added sccopy pass before LTO streaming and before
>         RTL expanstion.
>       * tree-pass.h (make_pass_sccopy): Added sccopy pass.
>       * gimple-ssa-sccopy.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/tree-ssa/pr79691.c: Updated scan-tree-dump to account
>         for additional copy propagation this patch adds.
>       * gcc.dg/sccopy-1.c: New test.
> 
> Signed-off-by: Filip Kastl <fka...@suse.cz>
> ---
>  gcc/Makefile.in                         |   1 +
>  gcc/gimple-ssa-sccopy.cc                | 687 ++++++++++++++++++++++++
>  gcc/passes.def                          |   3 +
>  gcc/testsuite/gcc.dg/sccopy-1.c         |  78 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr79691.c |   2 +-
>  gcc/tree-pass.h                         |   1 +
>  6 files changed, 771 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/gimple-ssa-sccopy.cc
>  create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 753f2f36618..2e6f2fef1ba 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1497,6 +1497,7 @@ OBJS = \
>       gimple-ssa-backprop.o \
>       gimple-ssa-isolate-paths.o \
>       gimple-ssa-nonnull-compare.o \
> +     gimple-ssa-sccopy.o \
>       gimple-ssa-split-paths.o \
>       gimple-ssa-store-merging.o \
>       gimple-ssa-strength-reduction.o \
> diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> new file mode 100644
> index 00000000000..b74f7a62569
> --- /dev/null
> +++ b/gcc/gimple-ssa-sccopy.cc
> @@ -0,0 +1,687 @@
> +/* Strongly-connected copy propagation pass for the GNU compiler.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   Contributed by Filip Kastl <fka...@suse.cz>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "vec.h"
> +#include "hash-set.h"
> +#include <algorithm>
> +#include "ssa-iterators.h"
> +#include "gimple-fold.h"
> +#include "gimplify.h"
> +#include "tree-cfg.h"
> +#include "tree-eh.h"
> +#include "builtins.h"
> +#include "tree-ssa-dce.h"
> +#include "fold-const.h"
> +
> +/* Strongly connected copy propagation pass.
> +
> +   This is a lightweight copy propagation pass that is also able to eliminate
> +   redundant PHI statements.  The pass considers the following types of copy
> +   statements:
> +
> +   1 An assignment statement with a single argument.
> +
> +   _3 = _2;
> +   _4 = 5;
> +
> +   2 A degenerate PHI statement.  A degenerate PHI is a PHI that only refers 
> to
> +     itself or one other value.
> +
> +   _5 = PHI <_1>;
> +   _6 = PHI <_6, _6, _1, _1>;
> +   _7 = PHI <16, _7>;
> +
> +   3 A set of PHI statements that only refer to each other or to one other
> +     value.
> +
> +   _8 = PHI <_9, _10>;
> +   _9 = PHI <_8, _10>;
> +   _10 = PHI <_8, _9, _1>;
> +
> +   All of these statements produce copies and can be eliminated from the
> +   program.  For a copy statement we identify the value it creates a copy of
> +   and replace references to the statement with the value -- we propagate the
> +   copy.
> +
> +   _3 = _2; // Replace all occurences of _3 by _2
> +
> +   _8 = PHI <_9, _10>;
> +   _9 = PHI <_8, _10>;
> +   _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1
> +
> +   To find all three types of copy statements we use an algorithm based on
> +   strongly-connected components (SCCs) in dataflow graph.  The algorithm was
> +   introduced in an article from 2013[1]. We describe the algorithm bellow.
> +
> +   To identify SCCs we implement the Robert Tarjan's SCC algorithm.  For the
> +   SCC computation we wrap potential copy statements in the 'vertex' struct.
> +   To each of these statements we also assign a vertex number ('vxnum'). 
> Since
> +   the main algorithm has to be able to compute SCCs of subgraphs of the 
> whole
> +   dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from
> +   leaving the subgraph.
> +
> +   References:
> +
> +     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> +     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> +     Section 3.2.  */
> +
> +/* Bitmap tracking statements which were propagated to be removed at the end 
> of
> +   the pass.  */
> +
> +static bitmap dead_stmts;
> +
> +/* State of vertex during SCC discovery.
> +
> +   unvisited  Vertex hasn't yet been popped from worklist.
> +   vopen      DFS has visited vertex for the first time.  Vertex has been put
> +           on Tarjan stack.
> +   closed     DFS has backtracked through vertex.  At this point, vertex
> +           doesn't have any unvisited neighbors.
> +   in_scc     Vertex has been popped from Tarjan stack.  */
> +
> +enum vstate
> +{
> +  unvisited,
> +  vopen,
> +  closed,
> +  in_scc
> +};
> +
> +/* Information about a vertex.  Used by SCC discovery.  */
> +
> +struct vertex
> +{
> +  bool active; /* scc_discovery::compute_sccs () only considers a subgraph of
> +               the whole dataflow graph.  It uses this flag so that it knows
> +               which vertices are part of this subgraph.  */
> +  vstate state;
> +  unsigned index;
> +  unsigned lowlink;
> +};
> +
> +/* SCC discovery.
> +
> +   Used to find SCCs in a dataflow graph.  Implements Tarjan's SCC
> +   algorithm.  */
> +
> +class scc_discovery
> +{
> +public:
> +  scc_discovery ();
> +  ~scc_discovery ();
> +  auto_vec<vec<gimple *>> compute_sccs (auto_vec<gimple *> &stmts);
> +
> +private:
> +  unsigned curr_generation = 0;
> +  vertex* vertices; /* Indexed by SSA_NAME_VERSION.  */
> +  auto_vec<unsigned> worklist; /* DFS stack.  */
> +  auto_vec<unsigned> stack; /* Tarjan stack.  */
> +
> +  void visit_neighbor (tree neigh_tree, unsigned parent_vxnum);
> +};
> +
> +scc_discovery::scc_discovery ()
> +{
> +  /* Create vertex struct for each SSA name.  */
> +  vertices = XNEWVEC (struct vertex, num_ssa_names);
> +  unsigned i = 0;
> +  for (i = 0; i < num_ssa_names; i++)
> +    vertices[i].active = false;
> +}
> +
> +scc_discovery::~scc_discovery ()
> +{
> +  XDELETEVEC (vertices);
> +}
> +
> +/* Part of 'scc_discovery::compute_sccs()'.  */
> +
> +void
> +scc_discovery::visit_neighbor (tree neigh_tree, unsigned parent_version)
> +{
> +  if (TREE_CODE (neigh_tree) != SSA_NAME)
> +    return; /* Skip any neighbor that isn't an SSA name.  */
> +  unsigned neigh_version = SSA_NAME_VERSION (neigh_tree);
> +
> +  /* Skip neighbors outside the subgraph that Tarjan currently works
> +     with.  */
> +  if (!vertices[neigh_version].active)
> +    return;
> +
> +  vstate neigh_state = vertices[neigh_version].state;
> +  vstate parent_state = vertices[parent_version].state;
> +  if (parent_state == vopen) /* We're currently opening parent.  */
> +    {
> +      /* Put unvisited neighbors on worklist.  Update lowlink of parent
> +      vertex according to indices of neighbors present on stack.  */
> +      switch (neigh_state)
> +     {
> +     case unvisited:
> +       worklist.safe_push (neigh_version);
> +       break;
> +     case vopen:
> +     case closed:
> +       vertices[parent_version].lowlink =
> +         std::min (vertices[parent_version].lowlink,
> +                   vertices[neigh_version].index);
> +       break;
> +     case in_scc:
> +       /* Ignore these edges.  */
> +       break;
> +     }
> +    }
> +  else if (parent_state == closed) /* We're currently closing parent.  */
> +    {
> +      /* Update lowlink of parent vertex according to lowlinks of
> +      children of parent (in terms of DFS tree).  */
> +      if (neigh_state == closed)
> +     {
> +       vertices[parent_version].lowlink =
> +         std::min (vertices[parent_version].lowlink,
> +                   vertices[neigh_version].lowlink);
> +     }
> +    }
> +}
> +
> +/* Compute SCCs in dataflow graph on given statements 'stmts'.  Ignore
> +   statements outside 'stmts'.  Return the SCCs in a reverse topological
> +   order.
> +
> +   stmt_may_generate_copy () must be true for all statements from 'stmts'!  
> */
> +
> +auto_vec<vec<gimple *>>
> +scc_discovery::compute_sccs (auto_vec<gimple *> &stmts)

Please use vec<gimple *> &stmts, not auto_vec<..>.

> +{
> +  auto_vec<vec<gimple *>> sccs;
> +
> +  for (gimple *stmt : stmts)
> +    {
> +      unsigned i;
> +      switch (gimple_code (stmt))
> +     {
> +       case GIMPLE_ASSIGN:
> +         i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
> +         break;
> +       case GIMPLE_PHI:
> +         i = SSA_NAME_VERSION (gimple_phi_result (stmt));
> +         break;
> +       default:
> +         gcc_unreachable ();
> +     }
> +
> +      vertices[i].index = 0;
> +      vertices[i].lowlink = 0;
> +      vertices[i].state = unvisited;
> +      vertices[i].active = true; /* Mark the subgraph we'll be working on so
> +                                 that we don't leave it.  */
> +
> +      worklist.safe_push (i);
> +    }
> +
> +  /* Worklist loop.  */
> +  unsigned curr_index = 0;
> +  while (!worklist.is_empty ())
> +    {
> +      unsigned i = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (ssa_name (i));
> +      vstate state = vertices[i].state;
> +
> +      if (state == unvisited)
> +     {
> +       vertices[i].state = vopen;
> +
> +       /* Assign index to this vertex.  */
> +       vertices[i].index = curr_index;
> +       vertices[i].lowlink = curr_index;
> +       curr_index++;
> +
> +       /* Put vertex on stack and also on worklist to be closed later.  */
> +       stack.safe_push (i);
> +       worklist.safe_push (i);
> +     }
> +      else if (state == vopen)
> +     vertices[i].state = closed;
> +
> +      /* Visit neighbors of this vertex.  */
> +      tree op;
> +      gphi *phi;
> +      switch (gimple_code (stmt))
> +     {
> +       case GIMPLE_PHI:
> +         phi = as_a <gphi *> (stmt);
> +         unsigned j;
> +         for (j = 0; j < gimple_phi_num_args (phi); j++)
> +           {
> +             op = gimple_phi_arg_def (phi, j);
> +             visit_neighbor (op, i);
> +           }
> +         break;
> +       case GIMPLE_ASSIGN:
> +         op = gimple_assign_rhs1 (stmt);
> +         visit_neighbor (op, i);
> +         break;
> +       default:
> +         gcc_unreachable ();
> +     }
> +
> +      /* If we've just closed a root vertex of an scc, pop scc from stack.  
> */
> +      if (state == vopen && vertices[i].lowlink == vertices[i].index)
> +     {
> +       vec<gimple *> scc = vNULL;
> +
> +       unsigned j;
> +       do
> +         {
> +           j = stack.pop ();
> +           scc.safe_push (SSA_NAME_DEF_STMT (ssa_name (j)));
> +           vertices[j].state = in_scc;
> +         }
> +       while (j != i);
> +
> +       sccs.safe_push (scc);
> +     }
> +    }
> +
> +  if (!stack.is_empty ())
> +    gcc_unreachable ();
> +
> +  /* Clear 'active' flags.  */
> +  for (gimple *stmt : stmts)
> +    {
> +      unsigned i;
> +      switch (gimple_code (stmt))
> +     {
> +       case GIMPLE_ASSIGN:
> +         i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
> +         break;
> +       case GIMPLE_PHI:
> +         i = SSA_NAME_VERSION (gimple_phi_result (stmt));
> +         break;
> +       default:
> +         gcc_unreachable ();
> +     }
> +
> +      vertices[i].active = false;
> +    }
> +
> +  return sccs;

I think you need to use

  return std::move (sccs);

here?  But maybe I'm confusing C++ here and this auto-magically works.

Please try to run valgrind leak check on a testcase you know the pass
performs some optimization.

> +}
> +
> +/* Could this statement potentially be a copy statement?
> +
> +   This pass only considers statements for which this function returns 
> 'true'.
> +   Those are basically PHI functions and assignment statements similar to
> +
> +   _2 = _1;
> +   or
> +   _2 = 5;  */
> +
> +static bool
> +stmt_may_generate_copy (gimple *stmt)
> +{
> +  /* A PHI may generate a copy.  */
> +  if (gimple_code (stmt) == GIMPLE_PHI)
> +    {
> +      gphi *phi = as_a <gphi *> (stmt);
> +
> +      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
> +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
> +     return false;
> +
> +      unsigned i;
> +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> +     {
> +       tree op = gimple_phi_arg_def (phi, i);
> +       if (TREE_CODE (op) == SSA_NAME
> +           && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> +         return false;
> +     }
> +
> +      /* If PHI has more than one unique non-SSA arguments, it won't 
> generate a
> +      copy.  */
> +      tree const_op = NULL_TREE;
> +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> +     {
> +       tree op = gimple_phi_arg_def (phi, i);
> +       if (TREE_CODE (op) != SSA_NAME)
> +         {
> +           if (const_op && !operand_equal_p (op, const_op))
> +             return false;
> +           const_op = op;
> +         }
> +     }
> +
> +      return true;
> +    }
> +
> +  /* Or a statement of type _2 = _1; OR _2 = 5; may generate a copy.  */
> +
> +  if (!gimple_assign_single_p (stmt))
> +    return false;
> +
> +  tree lhs = gimple_assign_lhs (stmt);
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +
> +  if (TREE_CODE (lhs) != SSA_NAME)
> +    return false;
> +
> +  /* lhs shouldn't flow through any abnormal edges.  */
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> +    return false;
> +
> +  if (is_gimple_min_invariant (rhs))
> +    return true;  /* A statement of type _2 = 5;.  */
> +
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    return false;
> +
> +  /* rhs shouldn't flow through any abnormal edges.  */
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
> +    return false;
> +
> +  /* It is possible that lhs has more alignment or value range information.  
> By
> +     propagating we would lose this information.  So in the case that 
> alignment
> +     or value range information differs, we are conservative and do not
> +     propagate.
> +
> +     FIXME: Propagate alignment and value range info the same way copy-prop
> +     does.  */
> +  if (POINTER_TYPE_P (TREE_TYPE (lhs))
> +      && POINTER_TYPE_P (TREE_TYPE (rhs))
> +      && SSA_NAME_PTR_INFO (lhs) != SSA_NAME_PTR_INFO (rhs))
> +    return false;
> +  if (!POINTER_TYPE_P (TREE_TYPE (lhs))
> +      && !POINTER_TYPE_P (TREE_TYPE (rhs))
> +      && SSA_NAME_RANGE_INFO (lhs) != SSA_NAME_RANGE_INFO (rhs))
> +    return false;
> +
> +  return true;  /* A statement of type _2 = _1;.  */
> +}
> +
> +/* Return all statements in cfun that could generate copies.  All statements
> +   for which stmt_may_generate_copy returns 'true'.  */
> +
> +static auto_vec<gimple *>
> +get_all_stmt_may_generate_copy (void)
> +{
> +  auto_vec<gimple *> result;
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +     {
> +       gimple *s = gsi_stmt (gsi);
> +       if (stmt_may_generate_copy (s))
> +         {
> +           result.safe_push (s);
> +         }
> +     }
> +
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> +     {
> +       gimple *s = pi.phi ();
> +       if (stmt_may_generate_copy (s))
> +         {
> +           result.safe_push (s);
> +         }
> +     }
> +    }
> +
> +  return result;
> +}
> +
> +/* For each statement from given SCC, replace its usages by value
> +   VAL.  */
> +
> +static void
> +replace_scc_by_value (vec<gimple *> scc, tree val)
> +{
> +  for (gimple *stmt : scc)
> +    {
> +      tree name = gimple_get_lhs (stmt);
> +      replace_uses_by (name, val);
> +      bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name));
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
> +}
> +
> +/* Part of 'sccopy_propagate ()'.  */
> +
> +static void
> +sccopy_visit_op (tree op, hash_set<tree> &outer_ops,
> +              hash_set<gimple *> &scc_set, bool &is_inner,
> +              tree &last_outer_op)
> +{
> +  bool op_in_scc = false;
> +
> +  if (TREE_CODE (op) == SSA_NAME)
> +    {
> +      gimple *op_stmt = SSA_NAME_DEF_STMT (op);
> +      if (scc_set.contains (op_stmt))
> +     op_in_scc = true;

As optimization this could be a [s]bitmap indexed by
SSA_NAME_VERSION (if you have many instances of scc_set
a bitmap is better, a sbitmap has O(n) initialization overhead).

> +    }
> +
> +  if (!op_in_scc)
> +    {
> +      outer_ops.add (op);

I do wonder if it's relevant to track non-SSA_NAME 'op' here?
Otherwise the same as above would apply.

This can be done as followup (if possible) if you like.

> +      last_outer_op = op;
> +      is_inner = false;
> +    }
> +}
> +
> +/* Main function of this pass.  Find and propagate all three types of copy
> +   statements (see pass description above).
> +
> +   This is an implementation of an algorithm from the paper Simple and
> +   Efficient Construction of Static Single Assignmemnt Form[1].  It is based
> +   on strongly-connected components (SCCs) in dataflow graph.  The original
> +   algorithm only considers PHI statements.  We extend it to also consider
> +   assignment statements of type _2 = _1;.
> +
> +   The algorithm is based on this definition of a set of redundant PHIs[1]:
> +
> +     A non-empty set P of PHI functions is redundant iff the PHI functions 
> just
> +     reference each other or one other value
> +
> +   It uses this lemma[1]:
> +
> +     Let P be a redundant set of PHI functions.  Then there is a
> +     strongly-connected component S subset of P that is also redundant.
> +
> +   The algorithm works in this way:
> +
> +     1 Find SCCs
> +     2 For each SCC S in topological order:
> +     3   Construct set 'inner' of statements that only have other statements
> +      from S on their right hand side
> +     4   Construct set 'outer' of values that originate outside S and appear 
> on
> +      right hand side of some statement from S
> +     5   If |outer| = 1, outer only contains a value v.  Statements in S only
> +      refer to each other or to v -- they are redundant.  Propagate v.
> +      Else, recurse on statements in inner.
> +
> +   The implementation is non-recursive.
> +
> +   References:
> +
> +     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> +     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> +     Section 3.2.  */
> +
> +static void
> +sccopy_propagate ()
> +{
> +  auto_vec<gimple *> useful_stmts = get_all_stmt_may_generate_copy ();
> +  scc_discovery discovery;
> +
> +  auto_vec<vec<gimple *>> worklist;
> +  worklist = discovery.compute_sccs (useful_stmts);

using initialization might be cheaper than first default-constructing

> +
> +  while (!worklist.is_empty ())
> +    {
> +      vec<gimple *> scc = worklist.pop ();
> +
> +      auto_vec<gimple *> inner;
> +      hash_set<tree> outer_ops;
> +      tree last_outer_op = NULL_TREE;
> +
> +      /* Prepare hash set of PHIs in scc to query later.  */

ah, that was to identify in-SCC vs. out-of-SCC stmts ...

> +      hash_set<gimple *> scc_set;
> +      for (gimple *stmt : scc)
> +     scc_set.add (stmt);
> +
> +      for (gimple *stmt : scc)
> +     {
> +       bool is_inner = true;
> +
> +       gphi *phi;
> +       tree op;
> +
> +       switch (gimple_code (stmt))
> +         {
> +           case GIMPLE_PHI:
> +             phi = as_a <gphi *> (stmt);
> +             unsigned j;
> +             for (j = 0; j < gimple_phi_num_args (phi); j++)
> +               {
> +                 op = gimple_phi_arg_def (phi, j);
> +                 sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> +                                last_outer_op);
> +               }
> +             break;
> +           case GIMPLE_ASSIGN:
> +             op = gimple_assign_rhs1 (stmt);
> +             sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> +                            last_outer_op);
> +             break;
> +           default:
> +             gcc_unreachable ();
> +         }
> +
> +       if (is_inner)
> +         {
> +           inner.safe_push (stmt);
> +         }

general coding-style comment, we don't put braces around
single stmts.

> +     }
> +
> +      if (outer_ops.elements () == 1)
> +     {
> +       /* The only operand in outer_ops.  */
> +       tree outer_op = last_outer_op;
> +       replace_scc_by_value (scc, outer_op);
> +     }
> +      else if (outer_ops.elements () > 1)
> +     {
> +       /* Add inner sccs to worklist.  */
> +       auto_vec<vec<gimple *>> inner_sccs =
> +         discovery.compute_sccs (inner);
> +       for (vec<gimple *> inner_scc : inner_sccs)
> +         worklist.safe_push (inner_scc);
> +     }
> +      else
> +     gcc_unreachable ();
> +
> +      scc.release ();
> +    }
> +}
> +
> +/* Called when pass execution starts.  */
> +
> +static void
> +init_sccopy (void)
> +{
> +  /* For propagated statements.  */
> +  dead_stmts = BITMAP_ALLOC (NULL);
> +}
> +
> +/* Called before pass execution ends.  */
> +
> +static void
> +finalize_sccopy (void)
> +{
> +  /* Remove all propagated statements.  */
> +  simple_dce_from_worklist (dead_stmts);
> +  BITMAP_FREE (dead_stmts);
> +
> +  /* Propagating a constant may create dead eh edges.  */
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    gimple_purge_dead_eh_edges (bb);
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_sccopy =
> +{
> +  GIMPLE_PASS, /* type */
> +  "sccopy", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +};
> +
> +class pass_sccopy : public gimple_opt_pass
> +{
> +public:
> +  pass_sccopy (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_sccopy, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return true; }
> +  virtual unsigned int execute (function *);
> +  opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
> +}; // class pass_sccopy
> +
> +unsigned
> +pass_sccopy::execute (function *)
> +{
> +  init_sccopy ();
> +  sccopy_propagate ();
> +  finalize_sccopy ();
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_sccopy (gcc::context *ctxt)
> +{
> +  return new pass_sccopy (ctxt);
> +}
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 1e1950bdb39..fa6c5a2c9fa 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
>         NEXT_PASS (pass_if_to_switch);
>         NEXT_PASS (pass_convert_switch);
>         NEXT_PASS (pass_cleanup_eh);
> +       NEXT_PASS (pass_sccopy);
>         NEXT_PASS (pass_profile);
>         NEXT_PASS (pass_local_pure_const);
>         NEXT_PASS (pass_modref);
> @@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
>        However, this also causes us to misdiagnose cases that should be
>        real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
>        NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> +      NEXT_PASS (pass_sccopy);
>        NEXT_PASS (pass_tail_calls);
>        /* Split critical edges before late uninit warning to reduce the
>           number of false positives from it.  */
> @@ -409,6 +411,7 @@ along with GCC; see the file COPYING3.  If not see
>        NEXT_PASS (pass_sancov);
>        NEXT_PASS (pass_asan);
>        NEXT_PASS (pass_tsan);
> +      NEXT_PASS (pass_sccopy);

I don't think we want to run the pass again with -Og.

OK with those changes.

Thanks,
Richard.

>        /* ???  We do want some kind of loop invariant motion, but we possibly
>           need to adjust LIM to be more friendly towards preserving accurate
>        debug information here.  */
> diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c
> new file mode 100644
> index 00000000000..1e61a6b320e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/sccopy-1.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */
> +/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */
> +
> +int __GIMPLE (ssa, startwith ("sccopy"))
> +main ()
> +{
> +  int a;
> +  int y;
> +  int x;
> +  int _1;
> +  int _2;
> +  int _13;
> +
> +  __BB(2):
> +  if (x_7(D) == 5)
> +    goto __BB3;
> +  else
> +    goto __BB4;
> +
> +  __BB(3):
> +  a_10 = x_7(D);
> +  goto __BB5;
> +
> +  __BB(4):
> +  a_9 = y_8(D);
> +  goto __BB5;
> +
> +  __BB(5):
> +  a_3 = __PHI (__BB3: a_10, __BB4: a_9);
> +  if (x_7(D) == y_8(D))
> +    goto __BB6;
> +  else
> +    goto __BB11;
> +
> +  __BB(6):
> +  a_11 = a_3 + 1;
> +  goto __BB7;
> +
> +  __BB(7):
> +  a_4 = __PHI (__BB6: a_11, __BB11: a_6);
> +label1:
> +  if (x_7(D) != y_8(D))
> +    goto __BB8;
> +  else
> +    goto __BB10;
> +
> +  __BB(8):
> +  goto __BB9;
> +
> +  __BB(9):
> +  a_12 = __PHI (__BB8: a_4, __BB10: a_5);
> +  goto __BB10;
> +
> +  __BB(10,loop_header(1)):
> +  a_5 = __PHI (__BB7: a_4, __BB9: a_12);
> +label2:
> +  _1 = y_8(D) * 2;
> +  if (x_7(D) == _1)
> +    goto __BB9;
> +  else
> +    goto __BB11;
> +
> +  __BB(11):
> +  a_6 = __PHI (__BB5: a_3, __BB10: a_5);
> +  _2 = x_7(D) * 3;
> +  if (y_8(D) == _2)
> +    goto __BB7;
> +  else
> +    goto __BB12;
> +
> +  __BB(12):
> +  _13 = 0;
> +  return _13;
> +
> +}
> +
> +
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
> index bf889318c06..43770c95bca 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr79691.c
> @@ -34,4 +34,4 @@ int f4 (int i)
>  
>  /* { dg-final { scan-tree-dump-times "sprintf" 1 "optimized" } }
>     { dg-final { scan-tree-dump-times "snprintf" 1 "optimized" } }
> -   { dg-final { scan-tree-dump " = 9;" "optimized" } } */
> +   { dg-final { scan-tree-dump "return 9;" "optimized" } } */
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 09e6ada5b2f..94a48461590 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize 
> (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to