> Am 13.12.2023 um 17:12 schrieb Filip Kastl <fka...@suse.cz>:
> 
> 
>> 
>>>> Hi,
>>>> 
>>>> this is a patch that I submitted two months ago as an RFC. I added some 
>>>> polish
>>>> since.
>>>> 
>>>> It is a new lightweight pass that removes redundant PHI functions and as a
>>>> bonus does basic copy propagation. With Jan Hubi?ka we measured that it is 
>>>> able
>>>> to remove usually more than 5% of all PHI functions when run among early 
>>>> passes
>>>> (sometimes even 13% or more). Those are mostly PHI functions that would be
>>>> later optimized away but with this pass it is possible to remove them early
>>>> enough so that they don't get streamed when runing LTO (and also 
>>>> potentially
>>>> inlined at multiple places). It is also able to remove some redundant PHIs
>>>> that otherwise would still be present during RTL expansion.
>>>> 
>>>> Jakub Jel?nek was concerned about debug info coverage so I compiled cc1plus
>>>> with and without this patch. These are the sizes of .debug_info and
>>>> .debug_loclists
>>>> 
>>>> .debug_info without patch 181694311
>>>> .debug_info    with patch 181692320
>>>> +0.0011% change
>>>> 
>>>> .debug_loclists without patch 47934753
>>>> .debug_loclists    with patch 47934966
>>>> -0.0004% change
>>>> 
>>>> I wanted to use dwlocstat to compare debug coverages but didn't manage to 
>>>> get
>>>> the program working on my machine sadly. Hope this suffices. Seems to me 
>>>> that
>>>> my patch doesn't have a significant impact on debug info.
>>>> 
>>>> Bootstraped and tested* on x86_64-pc-linux-gnu.
>>>> 
>>>> * One testcase (pr79691.c) did regress. However that is because the test is
>>>> dependent on a certain variable not being copy propagated. I will go into 
>>>> more
>>>> detail about this in a reply to this mail.
>>>> 
>>>> Ok to commit?
>>> 
>>> This is a second version of the patch.  In this version, I modified the
>>> pr79691.c testcase so that it works as intended with other changes from the
>>> patch.
>>> 
>>> The pr79691.c testcase checks that we get constants from snprintf calls and
>>> that they simplify into a single constant.  The testcase doesn't account for
>>> the fact that this constant may be further copy propagated which is exactly
>>> what happens with this patch applied.
>>> 
>>> Bootstrapped and tested on x86_64-pc-linux-gnu.
>>> 
>>> Ok to commit?
>> 
>> This is the third version of the patch. In this version, I adressed most of
>> Richards remarks about the second version. Here is a summary of changes I 
>> made:
>> 
>> - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
>> - Use simple_dce_from_worklist to remove propagated statements
>> - Use existing replace_uses API instead of reinventing it
>>  - This allowed me to get rid of some now redundant cleanup code
>> - Encapsulate the SCC finding into a class
>> - Rework stmt_may_generate_copy to get rid of redundant checks
>> - Add check that PHI doesn't contain two non-SSA-name values to
>>  stmt_may_generate_copy
>> - Regarding alignment and value ranges in stmt_may_generate_copy: For now use
>>  the conservative check that Richard suggested
>> - Index array of vertices that SCC discovery uses by SSA name version numbers
>>  instead of numbering statements myself.
>> 
>> 
>> I didn't make any changes based on these remarks:
>> 
>> 1 It might be nice to optimize SCCs of size 1 somehow, not sure how
>>  many times these appear - possibly prevent them from even entering
>>  the SCC discovery?
>> 
>> It would be nice. But the only way to do this that I see right now is to 
>> first
>> propagate SCCs of size 1 and then the rest. This would mean adding a new copy
>> propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
>> pass relies on having SCCs topogically sorted so this procedure would have to
>> implement some topological sort algorithm.
>> 
>> This could be done. It could save allocating some vec<>s (right now, SCCs of
>> size 1 are represented by a vec<> with a single element). But is it worth it 
>> to
>> optimize the pass this way right now? If possible, I'd like to see that the
>> pass works and sort out any problems people encounter with it before I start
>> optimizing it.
>> 
>> 2 Instead of collecting all stmts that may generate a copy at the beginning 
>> of
>>  the pass into a vec<>, let the SCC discovery check that statements may
>>  generate a copy on the fly.
>> 
>> This would be a big change to the pass, it would require a lot of reworking.
>> I'm also not sure if this would help reduce the number of allocated vec<>s 
>> that
>> much because I'll still want to represent SCCs by vec<>s.
>> 
>> Again - its possible I'll want to rework the pass in this way in the future 
>> but
>> I'd like to leave it as it is for now.
>> 
>> 3 Add a comment saying that the pass is doing optimistic copy propagation
>> 
>> I don't think the pass works in an optimistic way. It doesn't assume that all
>> variables are copies of each other at any point. It instead identifies copy
>> statements (or PHI SCCs that act as copy statements) and then for each of 
>> these
>> it propagates: By that I mean if a copy statement says that _3 is a copy of 
>> _2,
>> then it replaces all uses of _3 by _2.
>> 
>> But its possible that I still misinterpret what 'optimistic' means. If
>> mentioning that it works in an optimistic way would help readability of the
>> pass, I'm happy to add that comment.
>> 
>> Richard, is the patch ok as it is now? Or do you think that making changes 
>> 1, 2
>> or 3 is necessary? Or are there any other issues which should be adressed?
>> 
>> Filip
> 
> This is the fourth version of this patch. Changes in this version:
> - Disabled the pass for -Og
>  - This meant that I could revert my changes to gcc.dg/tree-ssa/pr79691.c
> - Fixed formatting of the patch (like braces around a single-statment if body)
> - compute_sccs (auto_vec<>...) -> compute_sccs (vec<> ...)
> - auto_vec<gimple *> worklist; worklist = ...; ->
>  auto_vec<gimple *> worklist = ...;
> 
> I've also checked that there are no memory leak issues. Valgrind doesn't 
> detect
> any aditional leaks when running with sccopy vs running without sccopy:
> 
> [fkastl@scala playground]$ valgrind $HOME/gcc/inst/bin/gcc 
> /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2
> ==27363== Memcheck, a memory error detector
> ==27363== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
> ==27363== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
> ==27363== Command: /home/fkastl/gcc/inst/bin/gcc 
> /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2
> ==27363== 
> ==27363== 
> ==27363== HEAP SUMMARY:
> ==27363==     in use at exit: 174,234 bytes in 92 blocks
> ==27363==   total heap usage: 408 allocs, 316 frees, 228,050 bytes allocated
> ==27363== 
> ==27363== LEAK SUMMARY:
> ==27363==    definitely lost: 5,935 bytes in 23 blocks
> ==27363==    indirectly lost: 82 bytes in 5 blocks
> ==27363==      possibly lost: 0 bytes in 0 blocks
> ==27363==    still reachable: 168,217 bytes in 64 blocks
> ==27363==                       of which reachable via heuristic:
> ==27363==                         newarray           : 1,544 bytes in 1 blocks
> ==27363==         suppressed: 0 bytes in 0 blocks
> ==27363== Rerun with --leak-check=full to see details of leaked memory
> ==27363== 
> ==27363== For lists of detected and suppressed errors, rerun with: -s
> ==27363== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
> 
> [fkastl@scala playground]$ valgrind $HOME/gcc/inst/bin/gcc 
> /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2 
> -fdisable-tree-sccopy1 -fdisable-tree-sccopy2
> ==27372== Memcheck, a memory error detector
> ==27372== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
> ==27372== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
> ==27372== Command: /home/fkastl/gcc/inst/bin/gcc 
> /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2 
> -fdisable-tree-sccopy1 -fdisable-tree-sccopy2
> ==27372== 
> cc1: note: disable pass tree-sccopy1 for functions in the range of [0, 
> 4294967295]
> cc1: note: disable pass tree-sccopy2 for functions in the range of [0, 
> 4294967295]
> ==27372== 
> ==27372== HEAP SUMMARY:
> ==27372==     in use at exit: 174,282 bytes in 92 blocks
> ==27372==   total heap usage: 408 allocs, 316 frees, 228,674 bytes allocated
> ==27372== 
> ==27372== LEAK SUMMARY:
> ==27372==    definitely lost: 5,935 bytes in 23 blocks
> ==27372==    indirectly lost: 82 bytes in 5 blocks
> ==27372==      possibly lost: 0 bytes in 0 blocks
> ==27372==    still reachable: 168,265 bytes in 64 blocks
> ==27372==                       of which reachable via heuristic:
> ==27372==                         newarray           : 1,544 bytes in 1 blocks
> ==27372==         suppressed: 0 bytes in 0 blocks
> ==27372== Rerun with --leak-check=full to see details of leaked memory
> ==27372== 
> ==27372== For lists of detected and suppressed errors, rerun with: -s
> ==27372== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
> 
> Richard, I wrote down the bitmap vs hash_set optimizations you suggested so
> that I can work on the pass more later and optimize it.
> 
> I didn't yet bootstrap this version. I just regtested it. Will bootstrap and
> regtest on the most current commit. Once that is successfully done, is the 
> pass
> OK to be pushed to main?

Yes.

Thanks,
Richard 


> Filip
> 
> -- >8 --
> 
> This patch adds the strongly-connected copy propagation (SCCOPY) pass.
> It is a lightweight GIMPLE copy propagation pass that also removes some
> redundant PHI statements. It handles degenerate PHIs, e.g.:
> 
> _5 = PHI <_1>;
> _6 = PHI <_6, _6, _1, _1>;
> _7 = PHI <16, _7>;
> // Replaces occurences of _5 and _6 by _1 and _7 by 16
> 
> It also handles more complicated situations, e.g.:
> 
> _8 = PHI <_9, _10>;
> _9 = PHI <_8, _10>;
> _10 = PHI <_8, _9, _1>;
> // Replaces occurences of _8, _9 and _10 by _1
> 
> gcc/ChangeLog:
> 
>    * Makefile.in: Added sccopy pass.
>    * passes.def: Added sccopy pass before LTO streaming and before
>      RTL expansion.
>    * tree-pass.h (make_pass_sccopy): Added sccopy pass.
>    * gimple-ssa-sccopy.cc: New file.
> 
> gcc/testsuite/ChangeLog:
> 
>    * gcc.dg/sccopy-1.c: New test.
> 
> Signed-off-by: Filip Kastl <fka...@suse.cz>
> ---
> gcc/Makefile.in                 |   1 +
> gcc/gimple-ssa-sccopy.cc        | 680 ++++++++++++++++++++++++++++++++
> gcc/passes.def                  |   2 +
> gcc/testsuite/gcc.dg/sccopy-1.c |  78 ++++
> gcc/tree-pass.h                 |   1 +
> 5 files changed, 762 insertions(+)
> create mode 100644 gcc/gimple-ssa-sccopy.cc
> create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 753f2f36618..2e6f2fef1ba 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1497,6 +1497,7 @@ OBJS = \
>    gimple-ssa-backprop.o \
>    gimple-ssa-isolate-paths.o \
>    gimple-ssa-nonnull-compare.o \
> +    gimple-ssa-sccopy.o \
>    gimple-ssa-split-paths.o \
>    gimple-ssa-store-merging.o \
>    gimple-ssa-strength-reduction.o \
> diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> new file mode 100644
> index 00000000000..ac5ec32eb32
> --- /dev/null
> +++ b/gcc/gimple-ssa-sccopy.cc
> @@ -0,0 +1,680 @@
> +/* Strongly-connected copy propagation pass for the GNU compiler.
> +   Copyright (C) 2023 Free Software Foundation, Inc.
> +   Contributed by Filip Kastl <fka...@suse.cz>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "vec.h"
> +#include "hash-set.h"
> +#include <algorithm>
> +#include "ssa-iterators.h"
> +#include "gimple-fold.h"
> +#include "gimplify.h"
> +#include "tree-cfg.h"
> +#include "tree-eh.h"
> +#include "builtins.h"
> +#include "tree-ssa-dce.h"
> +#include "fold-const.h"
> +
> +/* Strongly connected copy propagation pass.
> +
> +   This is a lightweight copy propagation pass that is also able to eliminate
> +   redundant PHI statements.  The pass considers the following types of copy
> +   statements:
> +
> +   1 An assignment statement with a single argument.
> +
> +   _3 = _2;
> +   _4 = 5;
> +
> +   2 A degenerate PHI statement.  A degenerate PHI is a PHI that only refers 
> to
> +     itself or one other value.
> +
> +   _5 = PHI <_1>;
> +   _6 = PHI <_6, _6, _1, _1>;
> +   _7 = PHI <16, _7>;
> +
> +   3 A set of PHI statements that only refer to each other or to one other
> +     value.
> +
> +   _8 = PHI <_9, _10>;
> +   _9 = PHI <_8, _10>;
> +   _10 = PHI <_8, _9, _1>;
> +
> +   All of these statements produce copies and can be eliminated from the
> +   program.  For a copy statement we identify the value it creates a copy of
> +   and replace references to the statement with the value -- we propagate the
> +   copy.
> +
> +   _3 = _2; // Replace all occurences of _3 by _2
> +
> +   _8 = PHI <_9, _10>;
> +   _9 = PHI <_8, _10>;
> +   _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1
> +
> +   To find all three types of copy statements we use an algorithm based on
> +   strongly-connected components (SCCs) in dataflow graph.  The algorithm was
> +   introduced in an article from 2013[1]. We describe the algorithm bellow.
> +
> +   To identify SCCs we implement the Robert Tarjan's SCC algorithm.  For the
> +   SCC computation we wrap potential copy statements in the 'vertex' struct.
> +   To each of these statements we also assign a vertex number ('vxnum'). 
> Since
> +   the main algorithm has to be able to compute SCCs of subgraphs of the 
> whole
> +   dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from
> +   leaving the subgraph.
> +
> +   References:
> +
> +     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> +     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> +     Section 3.2.  */
> +
> +/* Bitmap tracking statements which were propagated to be removed at the end 
> of
> +   the pass.  */
> +
> +static bitmap dead_stmts;
> +
> +/* State of vertex during SCC discovery.
> +
> +   unvisited  Vertex hasn't yet been popped from worklist.
> +   vopen      DFS has visited vertex for the first time.  Vertex has been put
> +          on Tarjan stack.
> +   closed     DFS has backtracked through vertex.  At this point, vertex
> +          doesn't have any unvisited neighbors.
> +   in_scc     Vertex has been popped from Tarjan stack.  */
> +
> +enum vstate
> +{
> +  unvisited,
> +  vopen,
> +  closed,
> +  in_scc
> +};
> +
> +/* Information about a vertex.  Used by SCC discovery.  */
> +
> +struct vertex
> +{
> +  bool active; /* scc_discovery::compute_sccs () only considers a subgraph of
> +          the whole dataflow graph.  It uses this flag so that it knows
> +          which vertices are part of this subgraph.  */
> +  vstate state;
> +  unsigned index;
> +  unsigned lowlink;
> +};
> +
> +/* SCC discovery.
> +
> +   Used to find SCCs in a dataflow graph.  Implements Tarjan's SCC
> +   algorithm.  */
> +
> +class scc_discovery
> +{
> +public:
> +  scc_discovery ();
> +  ~scc_discovery ();
> +  auto_vec<vec<gimple *>> compute_sccs (vec<gimple *> &stmts);
> +
> +private:
> +  unsigned curr_generation = 0;
> +  vertex* vertices; /* Indexed by SSA_NAME_VERSION.  */
> +  auto_vec<unsigned> worklist; /* DFS stack.  */
> +  auto_vec<unsigned> stack; /* Tarjan stack.  */
> +
> +  void visit_neighbor (tree neigh_tree, unsigned parent_vxnum);
> +};
> +
> +scc_discovery::scc_discovery ()
> +{
> +  /* Create vertex struct for each SSA name.  */
> +  vertices = XNEWVEC (struct vertex, num_ssa_names);
> +  unsigned i = 0;
> +  for (i = 0; i < num_ssa_names; i++)
> +    vertices[i].active = false;
> +}
> +
> +scc_discovery::~scc_discovery ()
> +{
> +  XDELETEVEC (vertices);
> +}
> +
> +/* Part of 'scc_discovery::compute_sccs ()'.  */
> +
> +void
> +scc_discovery::visit_neighbor (tree neigh_tree, unsigned parent_version)
> +{
> +  if (TREE_CODE (neigh_tree) != SSA_NAME)
> +    return; /* Skip any neighbor that isn't an SSA name.  */
> +  unsigned neigh_version = SSA_NAME_VERSION (neigh_tree);
> +
> +  /* Skip neighbors outside the subgraph that Tarjan currently works
> +     with.  */
> +  if (!vertices[neigh_version].active)
> +    return;
> +
> +  vstate neigh_state = vertices[neigh_version].state;
> +  vstate parent_state = vertices[parent_version].state;
> +  if (parent_state == vopen) /* We're currently opening parent.  */
> +    {
> +      /* Put unvisited neighbors on worklist.  Update lowlink of parent
> +     vertex according to indices of neighbors present on stack.  */
> +      switch (neigh_state)
> +    {
> +    case unvisited:
> +      worklist.safe_push (neigh_version);
> +      break;
> +    case vopen:
> +    case closed:
> +      vertices[parent_version].lowlink
> +        = std::min (vertices[parent_version].lowlink,
> +            vertices[neigh_version].index);
> +      break;
> +    case in_scc:
> +      /* Ignore these edges.  */
> +      break;
> +    }
> +    }
> +  else if (parent_state == closed) /* We're currently closing parent.  */
> +    {
> +      /* Update lowlink of parent vertex according to lowlinks of
> +     children of parent (in terms of DFS tree).  */
> +      if (neigh_state == closed)
> +    {
> +      vertices[parent_version].lowlink
> +        = std::min (vertices[parent_version].lowlink,
> +            vertices[neigh_version].lowlink);
> +    }
> +    }
> +}
> +
> +/* Compute SCCs in dataflow graph on given statements 'stmts'.  Ignore
> +   statements outside 'stmts'.  Return the SCCs in a reverse topological
> +   order.
> +
> +   stmt_may_generate_copy () must be true for all statements from 'stmts'!  
> */
> +
> +auto_vec<vec<gimple *>>
> +scc_discovery::compute_sccs (vec<gimple *> &stmts)
> +{
> +  auto_vec<vec<gimple *>> sccs;
> +
> +  for (gimple *stmt : stmts)
> +    {
> +      unsigned i;
> +      switch (gimple_code (stmt))
> +    {
> +      case GIMPLE_ASSIGN:
> +        i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
> +        break;
> +      case GIMPLE_PHI:
> +        i = SSA_NAME_VERSION (gimple_phi_result (stmt));
> +        break;
> +      default:
> +        gcc_unreachable ();
> +    }
> +
> +      vertices[i].index = 0;
> +      vertices[i].lowlink = 0;
> +      vertices[i].state = unvisited;
> +      vertices[i].active = true; /* Mark the subgraph we'll be working on so
> +                    that we don't leave it.  */
> +
> +      worklist.safe_push (i);
> +    }
> +
> +  /* Worklist loop.  */
> +  unsigned curr_index = 0;
> +  while (!worklist.is_empty ())
> +    {
> +      unsigned i = worklist.pop ();
> +      gimple *stmt = SSA_NAME_DEF_STMT (ssa_name (i));
> +      vstate state = vertices[i].state;
> +
> +      if (state == unvisited)
> +    {
> +      vertices[i].state = vopen;
> +
> +      /* Assign index to this vertex.  */
> +      vertices[i].index = curr_index;
> +      vertices[i].lowlink = curr_index;
> +      curr_index++;
> +
> +      /* Put vertex on stack and also on worklist to be closed later.  */
> +      stack.safe_push (i);
> +      worklist.safe_push (i);
> +    }
> +      else if (state == vopen)
> +    vertices[i].state = closed;
> +
> +      /* Visit neighbors of this vertex.  */
> +      tree op;
> +      gphi *phi;
> +      switch (gimple_code (stmt))
> +    {
> +      case GIMPLE_PHI:
> +        phi = as_a <gphi *> (stmt);
> +        unsigned j;
> +        for (j = 0; j < gimple_phi_num_args (phi); j++)
> +          {
> +        op = gimple_phi_arg_def (phi, j);
> +        visit_neighbor (op, i);
> +          }
> +        break;
> +      case GIMPLE_ASSIGN:
> +        op = gimple_assign_rhs1 (stmt);
> +        visit_neighbor (op, i);
> +        break;
> +      default:
> +        gcc_unreachable ();
> +    }
> +
> +      /* If we've just closed a root vertex of an scc, pop scc from stack.  
> */
> +      if (state == vopen && vertices[i].lowlink == vertices[i].index)
> +    {
> +      vec<gimple *> scc = vNULL;
> +
> +      unsigned j;
> +      do
> +        {
> +          j = stack.pop ();
> +          scc.safe_push (SSA_NAME_DEF_STMT (ssa_name (j)));
> +          vertices[j].state = in_scc;
> +        }
> +      while (j != i);
> +
> +      sccs.safe_push (scc);
> +    }
> +    }
> +
> +  if (!stack.is_empty ())
> +    gcc_unreachable ();
> +
> +  /* Clear 'active' flags.  */
> +  for (gimple *stmt : stmts)
> +    {
> +      unsigned i;
> +      switch (gimple_code (stmt))
> +    {
> +      case GIMPLE_ASSIGN:
> +        i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
> +        break;
> +      case GIMPLE_PHI:
> +        i = SSA_NAME_VERSION (gimple_phi_result (stmt));
> +        break;
> +      default:
> +        gcc_unreachable ();
> +    }
> +
> +      vertices[i].active = false;
> +    }
> +
> +  return sccs;
> +}
> +
> +/* Could this statement potentially be a copy statement?
> +
> +   This pass only considers statements for which this function returns 
> 'true'.
> +   Those are basically PHI functions and assignment statements similar to
> +
> +   _2 = _1;
> +   or
> +   _2 = 5;  */
> +
> +static bool
> +stmt_may_generate_copy (gimple *stmt)
> +{
> +  /* A PHI may generate a copy.  */
> +  if (gimple_code (stmt) == GIMPLE_PHI)
> +    {
> +      gphi *phi = as_a <gphi *> (stmt);
> +
> +      /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs.  */
> +      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
> +    return false;
> +
> +      unsigned i;
> +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> +    {
> +      tree op = gimple_phi_arg_def (phi, i);
> +      if (TREE_CODE (op) == SSA_NAME
> +          && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> +        return false;
> +    }
> +
> +      /* If PHI has more than one unique non-SSA arguments, it won't 
> generate a
> +     copy.  */
> +      tree const_op = NULL_TREE;
> +      for (i = 0; i < gimple_phi_num_args (phi); i++)
> +    {
> +      tree op = gimple_phi_arg_def (phi, i);
> +      if (TREE_CODE (op) != SSA_NAME)
> +        {
> +          if (const_op && !operand_equal_p (op, const_op))
> +        return false;
> +          const_op = op;
> +        }
> +    }
> +
> +      return true;
> +    }
> +
> +  /* Or a statement of type _2 = _1; OR _2 = 5; may generate a copy.  */
> +
> +  if (!gimple_assign_single_p (stmt))
> +    return false;
> +
> +  tree lhs = gimple_assign_lhs (stmt);
> +  tree rhs = gimple_assign_rhs1 (stmt);
> +
> +  if (TREE_CODE (lhs) != SSA_NAME)
> +    return false;
> +
> +  /* lhs shouldn't flow through any abnormal edges.  */
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> +    return false;
> +
> +  if (is_gimple_min_invariant (rhs))
> +    return true;  /* A statement of type _2 = 5;.  */
> +
> +  if (TREE_CODE (rhs) != SSA_NAME)
> +    return false;
> +
> +  /* rhs shouldn't flow through any abnormal edges.  */
> +  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
> +    return false;
> +
> +  /* It is possible that lhs has more alignment or value range information.  
> By
> +     propagating we would lose this information.  So in the case that 
> alignment
> +     or value range information differs, we are conservative and do not
> +     propagate.
> +
> +     FIXME: Propagate alignment and value range info the same way copy-prop
> +     does.  */
> +  if (POINTER_TYPE_P (TREE_TYPE (lhs))
> +      && POINTER_TYPE_P (TREE_TYPE (rhs))
> +      && SSA_NAME_PTR_INFO (lhs) != SSA_NAME_PTR_INFO (rhs))
> +    return false;
> +  if (!POINTER_TYPE_P (TREE_TYPE (lhs))
> +      && !POINTER_TYPE_P (TREE_TYPE (rhs))
> +      && SSA_NAME_RANGE_INFO (lhs) != SSA_NAME_RANGE_INFO (rhs))
> +    return false;
> +
> +  return true;  /* A statement of type _2 = _1;.  */
> +}
> +
> +/* Return all statements in cfun that could generate copies.  All statements
> +   for which stmt_may_generate_copy returns 'true'.  */
> +
> +static auto_vec<gimple *>
> +get_all_stmt_may_generate_copy (void)
> +{
> +  auto_vec<gimple *> result;
> +
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +    {
> +      gimple *s = gsi_stmt (gsi);
> +      if (stmt_may_generate_copy (s))
> +        result.safe_push (s);
> +    }
> +
> +      gphi_iterator pi;
> +      for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> +    {
> +      gimple *s = pi.phi ();
> +      if (stmt_may_generate_copy (s))
> +        result.safe_push (s);
> +    }
> +    }
> +
> +  return result;
> +}
> +
> +/* For each statement from given SCC, replace its usages by value
> +   VAL.  */
> +
> +static void
> +replace_scc_by_value (vec<gimple *> scc, tree val)
> +{
> +  for (gimple *stmt : scc)
> +    {
> +      tree name = gimple_get_lhs (stmt);
> +      replace_uses_by (name, val);
> +      bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name));
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
> +}
> +
> +/* Part of 'sccopy_propagate ()'.  */
> +
> +static void
> +sccopy_visit_op (tree op, hash_set<tree> &outer_ops,
> +         hash_set<gimple *> &scc_set, bool &is_inner,
> +         tree &last_outer_op)
> +{
> +  bool op_in_scc = false;
> +
> +  if (TREE_CODE (op) == SSA_NAME)
> +    {
> +      gimple *op_stmt = SSA_NAME_DEF_STMT (op);
> +      if (scc_set.contains (op_stmt))
> +    op_in_scc = true;
> +    }
> +
> +  if (!op_in_scc)
> +    {
> +      outer_ops.add (op);
> +      last_outer_op = op;
> +      is_inner = false;
> +    }
> +}
> +
> +/* Main function of this pass.  Find and propagate all three types of copy
> +   statements (see pass description above).
> +
> +   This is an implementation of an algorithm from the paper Simple and
> +   Efficient Construction of Static Single Assignmemnt Form[1].  It is based
> +   on strongly-connected components (SCCs) in dataflow graph.  The original
> +   algorithm only considers PHI statements.  We extend it to also consider
> +   assignment statements of type _2 = _1;.
> +
> +   The algorithm is based on this definition of a set of redundant PHIs[1]:
> +
> +     A non-empty set P of PHI functions is redundant iff the PHI functions 
> just
> +     reference each other or one other value
> +
> +   It uses this lemma[1]:
> +
> +     Let P be a redundant set of PHI functions.  Then there is a
> +     strongly-connected component S subset of P that is also redundant.
> +
> +   The algorithm works in this way:
> +
> +     1 Find SCCs
> +     2 For each SCC S in topological order:
> +     3   Construct set 'inner' of statements that only have other statements
> +     from S on their right hand side
> +     4   Construct set 'outer' of values that originate outside S and appear 
> on
> +     right hand side of some statement from S
> +     5   If |outer| = 1, outer only contains a value v.  Statements in S only
> +     refer to each other or to v -- they are redundant.  Propagate v.
> +     Else, recurse on statements in inner.
> +
> +   The implementation is non-recursive.
> +
> +   References:
> +
> +     [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> +     Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> +     Section 3.2.  */
> +
> +static void
> +sccopy_propagate ()
> +{
> +  auto_vec<gimple *> useful_stmts = get_all_stmt_may_generate_copy ();
> +  scc_discovery discovery;
> +
> +  auto_vec<vec<gimple *>> worklist = discovery.compute_sccs (useful_stmts);
> +
> +  while (!worklist.is_empty ())
> +    {
> +      vec<gimple *> scc = worklist.pop ();
> +
> +      auto_vec<gimple *> inner;
> +      hash_set<tree> outer_ops;
> +      tree last_outer_op = NULL_TREE;
> +
> +      /* Prepare hash set of PHIs in scc to query later.  */
> +      hash_set<gimple *> scc_set;
> +      for (gimple *stmt : scc)
> +    scc_set.add (stmt);
> +
> +      for (gimple *stmt : scc)
> +    {
> +      bool is_inner = true;
> +
> +      gphi *phi;
> +      tree op;
> +
> +      switch (gimple_code (stmt))
> +        {
> +          case GIMPLE_PHI:
> +        phi = as_a <gphi *> (stmt);
> +        unsigned j;
> +        for (j = 0; j < gimple_phi_num_args (phi); j++)
> +          {
> +            op = gimple_phi_arg_def (phi, j);
> +            sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> +                   last_outer_op);
> +          }
> +        break;
> +          case GIMPLE_ASSIGN:
> +        op = gimple_assign_rhs1 (stmt);
> +        sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> +                   last_outer_op);
> +        break;
> +          default:
> +        gcc_unreachable ();
> +        }
> +
> +      if (is_inner)
> +        inner.safe_push (stmt);
> +    }
> +
> +      if (outer_ops.elements () == 1)
> +    {
> +      /* The only operand in outer_ops.  */
> +      tree outer_op = last_outer_op;
> +      replace_scc_by_value (scc, outer_op);
> +    }
> +      else if (outer_ops.elements () > 1)
> +    {
> +      /* Add inner sccs to worklist.  */
> +      auto_vec<vec<gimple *>> inner_sccs
> +        = discovery.compute_sccs (inner);
> +      for (vec<gimple *> inner_scc : inner_sccs)
> +        worklist.safe_push (inner_scc);
> +    }
> +      else
> +    gcc_unreachable ();
> +
> +      scc.release ();
> +    }
> +}
> +
> +/* Called when pass execution starts.  */
> +
> +static void
> +init_sccopy (void)
> +{
> +  /* For propagated statements.  */
> +  dead_stmts = BITMAP_ALLOC (NULL);
> +}
> +
> +/* Called before pass execution ends.  */
> +
> +static void
> +finalize_sccopy (void)
> +{
> +  /* Remove all propagated statements.  */
> +  simple_dce_from_worklist (dead_stmts);
> +  BITMAP_FREE (dead_stmts);
> +
> +  /* Propagating a constant may create dead eh edges.  */
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    gimple_purge_dead_eh_edges (bb);
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_sccopy =
> +{
> +  GIMPLE_PASS, /* type */
> +  "sccopy", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  ( PROP_cfg | PROP_ssa ), /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +};
> +
> +class pass_sccopy : public gimple_opt_pass
> +{
> +public:
> +  pass_sccopy (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_sccopy, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *) { return true; }
> +  virtual unsigned int execute (function *);
> +  opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
> +}; // class pass_sccopy
> +
> +unsigned
> +pass_sccopy::execute (function *)
> +{
> +  init_sccopy ();
> +  sccopy_propagate ();
> +  finalize_sccopy ();
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_sccopy (gcc::context *ctxt)
> +{
> +  return new pass_sccopy (ctxt);
> +}
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 1e1950bdb39..048ad3ee7a5 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -100,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
>      NEXT_PASS (pass_if_to_switch);
>      NEXT_PASS (pass_convert_switch);
>      NEXT_PASS (pass_cleanup_eh);
> +      NEXT_PASS (pass_sccopy);
>      NEXT_PASS (pass_profile);
>      NEXT_PASS (pass_local_pure_const);
>      NEXT_PASS (pass_modref);
> @@ -368,6 +369,7 @@ along with GCC; see the file COPYING3.  If not see
>     However, this also causes us to misdiagnose cases that should be
>     real warnings (e.g., testsuite/gcc.dg/pr18501.c).  */
>       NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> +      NEXT_PASS (pass_sccopy);
>       NEXT_PASS (pass_tail_calls);
>       /* Split critical edges before late uninit warning to reduce the
>          number of false positives from it.  */
> diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c
> new file mode 100644
> index 00000000000..1e61a6b320e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/sccopy-1.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */
> +/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */
> +
> +int __GIMPLE (ssa, startwith ("sccopy"))
> +main ()
> +{
> +  int a;
> +  int y;
> +  int x;
> +  int _1;
> +  int _2;
> +  int _13;
> +
> +  __BB(2):
> +  if (x_7(D) == 5)
> +    goto __BB3;
> +  else
> +    goto __BB4;
> +
> +  __BB(3):
> +  a_10 = x_7(D);
> +  goto __BB5;
> +
> +  __BB(4):
> +  a_9 = y_8(D);
> +  goto __BB5;
> +
> +  __BB(5):
> +  a_3 = __PHI (__BB3: a_10, __BB4: a_9);
> +  if (x_7(D) == y_8(D))
> +    goto __BB6;
> +  else
> +    goto __BB11;
> +
> +  __BB(6):
> +  a_11 = a_3 + 1;
> +  goto __BB7;
> +
> +  __BB(7):
> +  a_4 = __PHI (__BB6: a_11, __BB11: a_6);
> +label1:
> +  if (x_7(D) != y_8(D))
> +    goto __BB8;
> +  else
> +    goto __BB10;
> +
> +  __BB(8):
> +  goto __BB9;
> +
> +  __BB(9):
> +  a_12 = __PHI (__BB8: a_4, __BB10: a_5);
> +  goto __BB10;
> +
> +  __BB(10,loop_header(1)):
> +  a_5 = __PHI (__BB7: a_4, __BB9: a_12);
> +label2:
> +  _1 = y_8(D) * 2;
> +  if (x_7(D) == _1)
> +    goto __BB9;
> +  else
> +    goto __BB11;
> +
> +  __BB(11):
> +  a_6 = __PHI (__BB5: a_3, __BB10: a_5);
> +  _2 = x_7(D) * 3;
> +  if (y_8(D) == _2)
> +    goto __BB7;
> +  else
> +    goto __BB12;
> +
> +  __BB(12):
> +  _13 = 0;
> +  return _13;
> +
> +}
> +
> +
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 09e6ada5b2f..94a48461590 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize 
> (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> --
> 2.43.0
> 

Reply via email to