> Am 13.12.2023 um 17:12 schrieb Filip Kastl <fka...@suse.cz>:
>
>
>>
>>>> Hi,
>>>>
>>>> this is a patch that I submitted two months ago as an RFC. I added some
>>>> polish
>>>> since.
>>>>
>>>> It is a new lightweight pass that removes redundant PHI functions and as a
>>>> bonus does basic copy propagation. With Jan Hubi?ka we measured that it is
>>>> able
>>>> to remove usually more than 5% of all PHI functions when run among early
>>>> passes
>>>> (sometimes even 13% or more). Those are mostly PHI functions that would be
>>>> later optimized away but with this pass it is possible to remove them early
>>>> enough so that they don't get streamed when runing LTO (and also
>>>> potentially
>>>> inlined at multiple places). It is also able to remove some redundant PHIs
>>>> that otherwise would still be present during RTL expansion.
>>>>
>>>> Jakub Jel?nek was concerned about debug info coverage so I compiled cc1plus
>>>> with and without this patch. These are the sizes of .debug_info and
>>>> .debug_loclists
>>>>
>>>> .debug_info without patch 181694311
>>>> .debug_info with patch 181692320
>>>> +0.0011% change
>>>>
>>>> .debug_loclists without patch 47934753
>>>> .debug_loclists with patch 47934966
>>>> -0.0004% change
>>>>
>>>> I wanted to use dwlocstat to compare debug coverages but didn't manage to
>>>> get
>>>> the program working on my machine sadly. Hope this suffices. Seems to me
>>>> that
>>>> my patch doesn't have a significant impact on debug info.
>>>>
>>>> Bootstraped and tested* on x86_64-pc-linux-gnu.
>>>>
>>>> * One testcase (pr79691.c) did regress. However that is because the test is
>>>> dependent on a certain variable not being copy propagated. I will go into
>>>> more
>>>> detail about this in a reply to this mail.
>>>>
>>>> Ok to commit?
>>>
>>> This is a second version of the patch. In this version, I modified the
>>> pr79691.c testcase so that it works as intended with other changes from the
>>> patch.
>>>
>>> The pr79691.c testcase checks that we get constants from snprintf calls and
>>> that they simplify into a single constant. The testcase doesn't account for
>>> the fact that this constant may be further copy propagated which is exactly
>>> what happens with this patch applied.
>>>
>>> Bootstrapped and tested on x86_64-pc-linux-gnu.
>>>
>>> Ok to commit?
>>
>> This is the third version of the patch. In this version, I adressed most of
>> Richards remarks about the second version. Here is a summary of changes I
>> made:
>>
>> - Rename the pass from tree-ssa-sccopy.cc to gimple-ssa-sccopy.cc
>> - Use simple_dce_from_worklist to remove propagated statements
>> - Use existing replace_uses API instead of reinventing it
>> - This allowed me to get rid of some now redundant cleanup code
>> - Encapsulate the SCC finding into a class
>> - Rework stmt_may_generate_copy to get rid of redundant checks
>> - Add check that PHI doesn't contain two non-SSA-name values to
>> stmt_may_generate_copy
>> - Regarding alignment and value ranges in stmt_may_generate_copy: For now use
>> the conservative check that Richard suggested
>> - Index array of vertices that SCC discovery uses by SSA name version numbers
>> instead of numbering statements myself.
>>
>>
>> I didn't make any changes based on these remarks:
>>
>> 1 It might be nice to optimize SCCs of size 1 somehow, not sure how
>> many times these appear - possibly prevent them from even entering
>> the SCC discovery?
>>
>> It would be nice. But the only way to do this that I see right now is to
>> first
>> propagate SCCs of size 1 and then the rest. This would mean adding a new copy
>> propagation procedure. It wouldn't be a trivial procedure. Efficiency of the
>> pass relies on having SCCs topogically sorted so this procedure would have to
>> implement some topological sort algorithm.
>>
>> This could be done. It could save allocating some vec<>s (right now, SCCs of
>> size 1 are represented by a vec<> with a single element). But is it worth it
>> to
>> optimize the pass this way right now? If possible, I'd like to see that the
>> pass works and sort out any problems people encounter with it before I start
>> optimizing it.
>>
>> 2 Instead of collecting all stmts that may generate a copy at the beginning
>> of
>> the pass into a vec<>, let the SCC discovery check that statements may
>> generate a copy on the fly.
>>
>> This would be a big change to the pass, it would require a lot of reworking.
>> I'm also not sure if this would help reduce the number of allocated vec<>s
>> that
>> much because I'll still want to represent SCCs by vec<>s.
>>
>> Again - its possible I'll want to rework the pass in this way in the future
>> but
>> I'd like to leave it as it is for now.
>>
>> 3 Add a comment saying that the pass is doing optimistic copy propagation
>>
>> I don't think the pass works in an optimistic way. It doesn't assume that all
>> variables are copies of each other at any point. It instead identifies copy
>> statements (or PHI SCCs that act as copy statements) and then for each of
>> these
>> it propagates: By that I mean if a copy statement says that _3 is a copy of
>> _2,
>> then it replaces all uses of _3 by _2.
>>
>> But its possible that I still misinterpret what 'optimistic' means. If
>> mentioning that it works in an optimistic way would help readability of the
>> pass, I'm happy to add that comment.
>>
>> Richard, is the patch ok as it is now? Or do you think that making changes
>> 1, 2
>> or 3 is necessary? Or are there any other issues which should be adressed?
>>
>> Filip
>
> This is the fourth version of this patch. Changes in this version:
> - Disabled the pass for -Og
> - This meant that I could revert my changes to gcc.dg/tree-ssa/pr79691.c
> - Fixed formatting of the patch (like braces around a single-statment if body)
> - compute_sccs (auto_vec<>...) -> compute_sccs (vec<> ...)
> - auto_vec<gimple *> worklist; worklist = ...; ->
> auto_vec<gimple *> worklist = ...;
>
> I've also checked that there are no memory leak issues. Valgrind doesn't
> detect
> any aditional leaks when running with sccopy vs running without sccopy:
>
> [fkastl@scala playground]$ valgrind $HOME/gcc/inst/bin/gcc
> /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2
> ==27363== Memcheck, a memory error detector
> ==27363== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
> ==27363== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
> ==27363== Command: /home/fkastl/gcc/inst/bin/gcc
> /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2
> ==27363==
> ==27363==
> ==27363== HEAP SUMMARY:
> ==27363== in use at exit: 174,234 bytes in 92 blocks
> ==27363== total heap usage: 408 allocs, 316 frees, 228,050 bytes allocated
> ==27363==
> ==27363== LEAK SUMMARY:
> ==27363== definitely lost: 5,935 bytes in 23 blocks
> ==27363== indirectly lost: 82 bytes in 5 blocks
> ==27363== possibly lost: 0 bytes in 0 blocks
> ==27363== still reachable: 168,217 bytes in 64 blocks
> ==27363== of which reachable via heuristic:
> ==27363== newarray : 1,544 bytes in 1 blocks
> ==27363== suppressed: 0 bytes in 0 blocks
> ==27363== Rerun with --leak-check=full to see details of leaked memory
> ==27363==
> ==27363== For lists of detected and suppressed errors, rerun with: -s
> ==27363== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
>
> [fkastl@scala playground]$ valgrind $HOME/gcc/inst/bin/gcc
> /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2
> -fdisable-tree-sccopy1 -fdisable-tree-sccopy2
> ==27372== Memcheck, a memory error detector
> ==27372== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
> ==27372== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
> ==27372== Command: /home/fkastl/gcc/inst/bin/gcc
> /home/fkastl/gcc/src/gcc/testsuite/gcc.dg/sccopy-1.c -fgimple -O2
> -fdisable-tree-sccopy1 -fdisable-tree-sccopy2
> ==27372==
> cc1: note: disable pass tree-sccopy1 for functions in the range of [0,
> 4294967295]
> cc1: note: disable pass tree-sccopy2 for functions in the range of [0,
> 4294967295]
> ==27372==
> ==27372== HEAP SUMMARY:
> ==27372== in use at exit: 174,282 bytes in 92 blocks
> ==27372== total heap usage: 408 allocs, 316 frees, 228,674 bytes allocated
> ==27372==
> ==27372== LEAK SUMMARY:
> ==27372== definitely lost: 5,935 bytes in 23 blocks
> ==27372== indirectly lost: 82 bytes in 5 blocks
> ==27372== possibly lost: 0 bytes in 0 blocks
> ==27372== still reachable: 168,265 bytes in 64 blocks
> ==27372== of which reachable via heuristic:
> ==27372== newarray : 1,544 bytes in 1 blocks
> ==27372== suppressed: 0 bytes in 0 blocks
> ==27372== Rerun with --leak-check=full to see details of leaked memory
> ==27372==
> ==27372== For lists of detected and suppressed errors, rerun with: -s
> ==27372== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
>
> Richard, I wrote down the bitmap vs hash_set optimizations you suggested so
> that I can work on the pass more later and optimize it.
>
> I didn't yet bootstrap this version. I just regtested it. Will bootstrap and
> regtest on the most current commit. Once that is successfully done, is the
> pass
> OK to be pushed to main?
Yes.
Thanks,
Richard
> Filip
>
> -- >8 --
>
> This patch adds the strongly-connected copy propagation (SCCOPY) pass.
> It is a lightweight GIMPLE copy propagation pass that also removes some
> redundant PHI statements. It handles degenerate PHIs, e.g.:
>
> _5 = PHI <_1>;
> _6 = PHI <_6, _6, _1, _1>;
> _7 = PHI <16, _7>;
> // Replaces occurences of _5 and _6 by _1 and _7 by 16
>
> It also handles more complicated situations, e.g.:
>
> _8 = PHI <_9, _10>;
> _9 = PHI <_8, _10>;
> _10 = PHI <_8, _9, _1>;
> // Replaces occurences of _8, _9 and _10 by _1
>
> gcc/ChangeLog:
>
> * Makefile.in: Added sccopy pass.
> * passes.def: Added sccopy pass before LTO streaming and before
> RTL expansion.
> * tree-pass.h (make_pass_sccopy): Added sccopy pass.
> * gimple-ssa-sccopy.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/sccopy-1.c: New test.
>
> Signed-off-by: Filip Kastl <fka...@suse.cz>
> ---
> gcc/Makefile.in | 1 +
> gcc/gimple-ssa-sccopy.cc | 680 ++++++++++++++++++++++++++++++++
> gcc/passes.def | 2 +
> gcc/testsuite/gcc.dg/sccopy-1.c | 78 ++++
> gcc/tree-pass.h | 1 +
> 5 files changed, 762 insertions(+)
> create mode 100644 gcc/gimple-ssa-sccopy.cc
> create mode 100644 gcc/testsuite/gcc.dg/sccopy-1.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 753f2f36618..2e6f2fef1ba 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1497,6 +1497,7 @@ OBJS = \
> gimple-ssa-backprop.o \
> gimple-ssa-isolate-paths.o \
> gimple-ssa-nonnull-compare.o \
> + gimple-ssa-sccopy.o \
> gimple-ssa-split-paths.o \
> gimple-ssa-store-merging.o \
> gimple-ssa-strength-reduction.o \
> diff --git a/gcc/gimple-ssa-sccopy.cc b/gcc/gimple-ssa-sccopy.cc
> new file mode 100644
> index 00000000000..ac5ec32eb32
> --- /dev/null
> +++ b/gcc/gimple-ssa-sccopy.cc
> @@ -0,0 +1,680 @@
> +/* Strongly-connected copy propagation pass for the GNU compiler.
> + Copyright (C) 2023 Free Software Foundation, Inc.
> + Contributed by Filip Kastl <fka...@suse.cz>
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3. If not see
> +<http://www.gnu.org/licenses/>. */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-iterator.h"
> +#include "vec.h"
> +#include "hash-set.h"
> +#include <algorithm>
> +#include "ssa-iterators.h"
> +#include "gimple-fold.h"
> +#include "gimplify.h"
> +#include "tree-cfg.h"
> +#include "tree-eh.h"
> +#include "builtins.h"
> +#include "tree-ssa-dce.h"
> +#include "fold-const.h"
> +
> +/* Strongly connected copy propagation pass.
> +
> + This is a lightweight copy propagation pass that is also able to eliminate
> + redundant PHI statements. The pass considers the following types of copy
> + statements:
> +
> + 1 An assignment statement with a single argument.
> +
> + _3 = _2;
> + _4 = 5;
> +
> + 2 A degenerate PHI statement. A degenerate PHI is a PHI that only refers
> to
> + itself or one other value.
> +
> + _5 = PHI <_1>;
> + _6 = PHI <_6, _6, _1, _1>;
> + _7 = PHI <16, _7>;
> +
> + 3 A set of PHI statements that only refer to each other or to one other
> + value.
> +
> + _8 = PHI <_9, _10>;
> + _9 = PHI <_8, _10>;
> + _10 = PHI <_8, _9, _1>;
> +
> + All of these statements produce copies and can be eliminated from the
> + program. For a copy statement we identify the value it creates a copy of
> + and replace references to the statement with the value -- we propagate the
> + copy.
> +
> + _3 = _2; // Replace all occurences of _3 by _2
> +
> + _8 = PHI <_9, _10>;
> + _9 = PHI <_8, _10>;
> + _10 = PHI <_8, _9, _1>; // Replace all occurences of _8, _9 and _10 by _1
> +
> + To find all three types of copy statements we use an algorithm based on
> + strongly-connected components (SCCs) in dataflow graph. The algorithm was
> + introduced in an article from 2013[1]. We describe the algorithm bellow.
> +
> + To identify SCCs we implement the Robert Tarjan's SCC algorithm. For the
> + SCC computation we wrap potential copy statements in the 'vertex' struct.
> + To each of these statements we also assign a vertex number ('vxnum').
> Since
> + the main algorithm has to be able to compute SCCs of subgraphs of the
> whole
> + dataflow graph we use GIMPLE stmt flags to prevent Tarjan's algorithm from
> + leaving the subgraph.
> +
> + References:
> +
> + [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> + Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> + Section 3.2. */
> +
> +/* Bitmap tracking statements which were propagated to be removed at the end
> of
> + the pass. */
> +
> +static bitmap dead_stmts;
> +
> +/* State of vertex during SCC discovery.
> +
> + unvisited Vertex hasn't yet been popped from worklist.
> + vopen DFS has visited vertex for the first time. Vertex has been put
> + on Tarjan stack.
> + closed DFS has backtracked through vertex. At this point, vertex
> + doesn't have any unvisited neighbors.
> + in_scc Vertex has been popped from Tarjan stack. */
> +
> +enum vstate
> +{
> + unvisited,
> + vopen,
> + closed,
> + in_scc
> +};
> +
> +/* Information about a vertex. Used by SCC discovery. */
> +
> +struct vertex
> +{
> + bool active; /* scc_discovery::compute_sccs () only considers a subgraph of
> + the whole dataflow graph. It uses this flag so that it knows
> + which vertices are part of this subgraph. */
> + vstate state;
> + unsigned index;
> + unsigned lowlink;
> +};
> +
> +/* SCC discovery.
> +
> + Used to find SCCs in a dataflow graph. Implements Tarjan's SCC
> + algorithm. */
> +
> +class scc_discovery
> +{
> +public:
> + scc_discovery ();
> + ~scc_discovery ();
> + auto_vec<vec<gimple *>> compute_sccs (vec<gimple *> &stmts);
> +
> +private:
> + unsigned curr_generation = 0;
> + vertex* vertices; /* Indexed by SSA_NAME_VERSION. */
> + auto_vec<unsigned> worklist; /* DFS stack. */
> + auto_vec<unsigned> stack; /* Tarjan stack. */
> +
> + void visit_neighbor (tree neigh_tree, unsigned parent_vxnum);
> +};
> +
> +scc_discovery::scc_discovery ()
> +{
> + /* Create vertex struct for each SSA name. */
> + vertices = XNEWVEC (struct vertex, num_ssa_names);
> + unsigned i = 0;
> + for (i = 0; i < num_ssa_names; i++)
> + vertices[i].active = false;
> +}
> +
> +scc_discovery::~scc_discovery ()
> +{
> + XDELETEVEC (vertices);
> +}
> +
> +/* Part of 'scc_discovery::compute_sccs ()'. */
> +
> +void
> +scc_discovery::visit_neighbor (tree neigh_tree, unsigned parent_version)
> +{
> + if (TREE_CODE (neigh_tree) != SSA_NAME)
> + return; /* Skip any neighbor that isn't an SSA name. */
> + unsigned neigh_version = SSA_NAME_VERSION (neigh_tree);
> +
> + /* Skip neighbors outside the subgraph that Tarjan currently works
> + with. */
> + if (!vertices[neigh_version].active)
> + return;
> +
> + vstate neigh_state = vertices[neigh_version].state;
> + vstate parent_state = vertices[parent_version].state;
> + if (parent_state == vopen) /* We're currently opening parent. */
> + {
> + /* Put unvisited neighbors on worklist. Update lowlink of parent
> + vertex according to indices of neighbors present on stack. */
> + switch (neigh_state)
> + {
> + case unvisited:
> + worklist.safe_push (neigh_version);
> + break;
> + case vopen:
> + case closed:
> + vertices[parent_version].lowlink
> + = std::min (vertices[parent_version].lowlink,
> + vertices[neigh_version].index);
> + break;
> + case in_scc:
> + /* Ignore these edges. */
> + break;
> + }
> + }
> + else if (parent_state == closed) /* We're currently closing parent. */
> + {
> + /* Update lowlink of parent vertex according to lowlinks of
> + children of parent (in terms of DFS tree). */
> + if (neigh_state == closed)
> + {
> + vertices[parent_version].lowlink
> + = std::min (vertices[parent_version].lowlink,
> + vertices[neigh_version].lowlink);
> + }
> + }
> +}
> +
> +/* Compute SCCs in dataflow graph on given statements 'stmts'. Ignore
> + statements outside 'stmts'. Return the SCCs in a reverse topological
> + order.
> +
> + stmt_may_generate_copy () must be true for all statements from 'stmts'!
> */
> +
> +auto_vec<vec<gimple *>>
> +scc_discovery::compute_sccs (vec<gimple *> &stmts)
> +{
> + auto_vec<vec<gimple *>> sccs;
> +
> + for (gimple *stmt : stmts)
> + {
> + unsigned i;
> + switch (gimple_code (stmt))
> + {
> + case GIMPLE_ASSIGN:
> + i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
> + break;
> + case GIMPLE_PHI:
> + i = SSA_NAME_VERSION (gimple_phi_result (stmt));
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + vertices[i].index = 0;
> + vertices[i].lowlink = 0;
> + vertices[i].state = unvisited;
> + vertices[i].active = true; /* Mark the subgraph we'll be working on so
> + that we don't leave it. */
> +
> + worklist.safe_push (i);
> + }
> +
> + /* Worklist loop. */
> + unsigned curr_index = 0;
> + while (!worklist.is_empty ())
> + {
> + unsigned i = worklist.pop ();
> + gimple *stmt = SSA_NAME_DEF_STMT (ssa_name (i));
> + vstate state = vertices[i].state;
> +
> + if (state == unvisited)
> + {
> + vertices[i].state = vopen;
> +
> + /* Assign index to this vertex. */
> + vertices[i].index = curr_index;
> + vertices[i].lowlink = curr_index;
> + curr_index++;
> +
> + /* Put vertex on stack and also on worklist to be closed later. */
> + stack.safe_push (i);
> + worklist.safe_push (i);
> + }
> + else if (state == vopen)
> + vertices[i].state = closed;
> +
> + /* Visit neighbors of this vertex. */
> + tree op;
> + gphi *phi;
> + switch (gimple_code (stmt))
> + {
> + case GIMPLE_PHI:
> + phi = as_a <gphi *> (stmt);
> + unsigned j;
> + for (j = 0; j < gimple_phi_num_args (phi); j++)
> + {
> + op = gimple_phi_arg_def (phi, j);
> + visit_neighbor (op, i);
> + }
> + break;
> + case GIMPLE_ASSIGN:
> + op = gimple_assign_rhs1 (stmt);
> + visit_neighbor (op, i);
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + /* If we've just closed a root vertex of an scc, pop scc from stack.
> */
> + if (state == vopen && vertices[i].lowlink == vertices[i].index)
> + {
> + vec<gimple *> scc = vNULL;
> +
> + unsigned j;
> + do
> + {
> + j = stack.pop ();
> + scc.safe_push (SSA_NAME_DEF_STMT (ssa_name (j)));
> + vertices[j].state = in_scc;
> + }
> + while (j != i);
> +
> + sccs.safe_push (scc);
> + }
> + }
> +
> + if (!stack.is_empty ())
> + gcc_unreachable ();
> +
> + /* Clear 'active' flags. */
> + for (gimple *stmt : stmts)
> + {
> + unsigned i;
> + switch (gimple_code (stmt))
> + {
> + case GIMPLE_ASSIGN:
> + i = SSA_NAME_VERSION (gimple_assign_lhs (stmt));
> + break;
> + case GIMPLE_PHI:
> + i = SSA_NAME_VERSION (gimple_phi_result (stmt));
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + vertices[i].active = false;
> + }
> +
> + return sccs;
> +}
> +
> +/* Could this statement potentially be a copy statement?
> +
> + This pass only considers statements for which this function returns
> 'true'.
> + Those are basically PHI functions and assignment statements similar to
> +
> + _2 = _1;
> + or
> + _2 = 5; */
> +
> +static bool
> +stmt_may_generate_copy (gimple *stmt)
> +{
> + /* A PHI may generate a copy. */
> + if (gimple_code (stmt) == GIMPLE_PHI)
> + {
> + gphi *phi = as_a <gphi *> (stmt);
> +
> + /* No OCCURS_IN_ABNORMAL_PHI SSA names in lhs nor rhs. */
> + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (phi)))
> + return false;
> +
> + unsigned i;
> + for (i = 0; i < gimple_phi_num_args (phi); i++)
> + {
> + tree op = gimple_phi_arg_def (phi, i);
> + if (TREE_CODE (op) == SSA_NAME
> + && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
> + return false;
> + }
> +
> + /* If PHI has more than one unique non-SSA arguments, it won't
> generate a
> + copy. */
> + tree const_op = NULL_TREE;
> + for (i = 0; i < gimple_phi_num_args (phi); i++)
> + {
> + tree op = gimple_phi_arg_def (phi, i);
> + if (TREE_CODE (op) != SSA_NAME)
> + {
> + if (const_op && !operand_equal_p (op, const_op))
> + return false;
> + const_op = op;
> + }
> + }
> +
> + return true;
> + }
> +
> + /* Or a statement of type _2 = _1; OR _2 = 5; may generate a copy. */
> +
> + if (!gimple_assign_single_p (stmt))
> + return false;
> +
> + tree lhs = gimple_assign_lhs (stmt);
> + tree rhs = gimple_assign_rhs1 (stmt);
> +
> + if (TREE_CODE (lhs) != SSA_NAME)
> + return false;
> +
> + /* lhs shouldn't flow through any abnormal edges. */
> + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> + return false;
> +
> + if (is_gimple_min_invariant (rhs))
> + return true; /* A statement of type _2 = 5;. */
> +
> + if (TREE_CODE (rhs) != SSA_NAME)
> + return false;
> +
> + /* rhs shouldn't flow through any abnormal edges. */
> + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (rhs))
> + return false;
> +
> + /* It is possible that lhs has more alignment or value range information.
> By
> + propagating we would lose this information. So in the case that
> alignment
> + or value range information differs, we are conservative and do not
> + propagate.
> +
> + FIXME: Propagate alignment and value range info the same way copy-prop
> + does. */
> + if (POINTER_TYPE_P (TREE_TYPE (lhs))
> + && POINTER_TYPE_P (TREE_TYPE (rhs))
> + && SSA_NAME_PTR_INFO (lhs) != SSA_NAME_PTR_INFO (rhs))
> + return false;
> + if (!POINTER_TYPE_P (TREE_TYPE (lhs))
> + && !POINTER_TYPE_P (TREE_TYPE (rhs))
> + && SSA_NAME_RANGE_INFO (lhs) != SSA_NAME_RANGE_INFO (rhs))
> + return false;
> +
> + return true; /* A statement of type _2 = _1;. */
> +}
> +
> +/* Return all statements in cfun that could generate copies. All statements
> + for which stmt_may_generate_copy returns 'true'. */
> +
> +static auto_vec<gimple *>
> +get_all_stmt_may_generate_copy (void)
> +{
> + auto_vec<gimple *> result;
> +
> + basic_block bb;
> + FOR_EACH_BB_FN (bb, cfun)
> + {
> + gimple_stmt_iterator gsi;
> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> + {
> + gimple *s = gsi_stmt (gsi);
> + if (stmt_may_generate_copy (s))
> + result.safe_push (s);
> + }
> +
> + gphi_iterator pi;
> + for (pi = gsi_start_phis (bb); !gsi_end_p (pi); gsi_next (&pi))
> + {
> + gimple *s = pi.phi ();
> + if (stmt_may_generate_copy (s))
> + result.safe_push (s);
> + }
> + }
> +
> + return result;
> +}
> +
> +/* For each statement from given SCC, replace its usages by value
> + VAL. */
> +
> +static void
> +replace_scc_by_value (vec<gimple *> scc, tree val)
> +{
> + for (gimple *stmt : scc)
> + {
> + tree name = gimple_get_lhs (stmt);
> + replace_uses_by (name, val);
> + bitmap_set_bit (dead_stmts, SSA_NAME_VERSION (name));
> + }
> +
> + if (dump_file)
> + fprintf (dump_file, "Replacing SCC of size %d\n", scc.length ());
> +}
> +
> +/* Part of 'sccopy_propagate ()'. */
> +
> +static void
> +sccopy_visit_op (tree op, hash_set<tree> &outer_ops,
> + hash_set<gimple *> &scc_set, bool &is_inner,
> + tree &last_outer_op)
> +{
> + bool op_in_scc = false;
> +
> + if (TREE_CODE (op) == SSA_NAME)
> + {
> + gimple *op_stmt = SSA_NAME_DEF_STMT (op);
> + if (scc_set.contains (op_stmt))
> + op_in_scc = true;
> + }
> +
> + if (!op_in_scc)
> + {
> + outer_ops.add (op);
> + last_outer_op = op;
> + is_inner = false;
> + }
> +}
> +
> +/* Main function of this pass. Find and propagate all three types of copy
> + statements (see pass description above).
> +
> + This is an implementation of an algorithm from the paper Simple and
> + Efficient Construction of Static Single Assignmemnt Form[1]. It is based
> + on strongly-connected components (SCCs) in dataflow graph. The original
> + algorithm only considers PHI statements. We extend it to also consider
> + assignment statements of type _2 = _1;.
> +
> + The algorithm is based on this definition of a set of redundant PHIs[1]:
> +
> + A non-empty set P of PHI functions is redundant iff the PHI functions
> just
> + reference each other or one other value
> +
> + It uses this lemma[1]:
> +
> + Let P be a redundant set of PHI functions. Then there is a
> + strongly-connected component S subset of P that is also redundant.
> +
> + The algorithm works in this way:
> +
> + 1 Find SCCs
> + 2 For each SCC S in topological order:
> + 3 Construct set 'inner' of statements that only have other statements
> + from S on their right hand side
> + 4 Construct set 'outer' of values that originate outside S and appear
> on
> + right hand side of some statement from S
> + 5 If |outer| = 1, outer only contains a value v. Statements in S only
> + refer to each other or to v -- they are redundant. Propagate v.
> + Else, recurse on statements in inner.
> +
> + The implementation is non-recursive.
> +
> + References:
> +
> + [1] Simple and Efficient Construction of Static Single Assignmemnt Form,
> + Braun, Buchwald, Hack, Leissa, Mallon, Zwinkau, 2013, LNCS vol. 7791,
> + Section 3.2. */
> +
> +static void
> +sccopy_propagate ()
> +{
> + auto_vec<gimple *> useful_stmts = get_all_stmt_may_generate_copy ();
> + scc_discovery discovery;
> +
> + auto_vec<vec<gimple *>> worklist = discovery.compute_sccs (useful_stmts);
> +
> + while (!worklist.is_empty ())
> + {
> + vec<gimple *> scc = worklist.pop ();
> +
> + auto_vec<gimple *> inner;
> + hash_set<tree> outer_ops;
> + tree last_outer_op = NULL_TREE;
> +
> + /* Prepare hash set of PHIs in scc to query later. */
> + hash_set<gimple *> scc_set;
> + for (gimple *stmt : scc)
> + scc_set.add (stmt);
> +
> + for (gimple *stmt : scc)
> + {
> + bool is_inner = true;
> +
> + gphi *phi;
> + tree op;
> +
> + switch (gimple_code (stmt))
> + {
> + case GIMPLE_PHI:
> + phi = as_a <gphi *> (stmt);
> + unsigned j;
> + for (j = 0; j < gimple_phi_num_args (phi); j++)
> + {
> + op = gimple_phi_arg_def (phi, j);
> + sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> + last_outer_op);
> + }
> + break;
> + case GIMPLE_ASSIGN:
> + op = gimple_assign_rhs1 (stmt);
> + sccopy_visit_op (op, outer_ops, scc_set, is_inner,
> + last_outer_op);
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + if (is_inner)
> + inner.safe_push (stmt);
> + }
> +
> + if (outer_ops.elements () == 1)
> + {
> + /* The only operand in outer_ops. */
> + tree outer_op = last_outer_op;
> + replace_scc_by_value (scc, outer_op);
> + }
> + else if (outer_ops.elements () > 1)
> + {
> + /* Add inner sccs to worklist. */
> + auto_vec<vec<gimple *>> inner_sccs
> + = discovery.compute_sccs (inner);
> + for (vec<gimple *> inner_scc : inner_sccs)
> + worklist.safe_push (inner_scc);
> + }
> + else
> + gcc_unreachable ();
> +
> + scc.release ();
> + }
> +}
> +
> +/* Called when pass execution starts. */
> +
> +static void
> +init_sccopy (void)
> +{
> + /* For propagated statements. */
> + dead_stmts = BITMAP_ALLOC (NULL);
> +}
> +
> +/* Called before pass execution ends. */
> +
> +static void
> +finalize_sccopy (void)
> +{
> + /* Remove all propagated statements. */
> + simple_dce_from_worklist (dead_stmts);
> + BITMAP_FREE (dead_stmts);
> +
> + /* Propagating a constant may create dead eh edges. */
> + basic_block bb;
> + FOR_EACH_BB_FN (bb, cfun)
> + gimple_purge_dead_eh_edges (bb);
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_sccopy =
> +{
> + GIMPLE_PASS, /* type */
> + "sccopy", /* name */
> + OPTGROUP_NONE, /* optinfo_flags */
> + TV_NONE, /* tv_id */
> + ( PROP_cfg | PROP_ssa ), /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
> +};
> +
> +class pass_sccopy : public gimple_opt_pass
> +{
> +public:
> + pass_sccopy (gcc::context *ctxt)
> + : gimple_opt_pass (pass_data_sccopy, ctxt)
> + {}
> +
> + /* opt_pass methods: */
> + virtual bool gate (function *) { return true; }
> + virtual unsigned int execute (function *);
> + opt_pass * clone () final override { return new pass_sccopy (m_ctxt); }
> +}; // class pass_sccopy
> +
> +unsigned
> +pass_sccopy::execute (function *)
> +{
> + init_sccopy ();
> + sccopy_propagate ();
> + finalize_sccopy ();
> + return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_sccopy (gcc::context *ctxt)
> +{
> + return new pass_sccopy (ctxt);
> +}
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 1e1950bdb39..048ad3ee7a5 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -100,6 +100,7 @@ along with GCC; see the file COPYING3. If not see
> NEXT_PASS (pass_if_to_switch);
> NEXT_PASS (pass_convert_switch);
> NEXT_PASS (pass_cleanup_eh);
> + NEXT_PASS (pass_sccopy);
> NEXT_PASS (pass_profile);
> NEXT_PASS (pass_local_pure_const);
> NEXT_PASS (pass_modref);
> @@ -368,6 +369,7 @@ along with GCC; see the file COPYING3. If not see
> However, this also causes us to misdiagnose cases that should be
> real warnings (e.g., testsuite/gcc.dg/pr18501.c). */
> NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> + NEXT_PASS (pass_sccopy);
> NEXT_PASS (pass_tail_calls);
> /* Split critical edges before late uninit warning to reduce the
> number of false positives from it. */
> diff --git a/gcc/testsuite/gcc.dg/sccopy-1.c b/gcc/testsuite/gcc.dg/sccopy-1.c
> new file mode 100644
> index 00000000000..1e61a6b320e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/sccopy-1.c
> @@ -0,0 +1,78 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fgimple -fdump-tree-sccopy -O2" } */
> +/* { dg-final { scan-tree-dump "Replacing SCC of size 2" "sccopy1" } } */
> +
> +int __GIMPLE (ssa, startwith ("sccopy"))
> +main ()
> +{
> + int a;
> + int y;
> + int x;
> + int _1;
> + int _2;
> + int _13;
> +
> + __BB(2):
> + if (x_7(D) == 5)
> + goto __BB3;
> + else
> + goto __BB4;
> +
> + __BB(3):
> + a_10 = x_7(D);
> + goto __BB5;
> +
> + __BB(4):
> + a_9 = y_8(D);
> + goto __BB5;
> +
> + __BB(5):
> + a_3 = __PHI (__BB3: a_10, __BB4: a_9);
> + if (x_7(D) == y_8(D))
> + goto __BB6;
> + else
> + goto __BB11;
> +
> + __BB(6):
> + a_11 = a_3 + 1;
> + goto __BB7;
> +
> + __BB(7):
> + a_4 = __PHI (__BB6: a_11, __BB11: a_6);
> +label1:
> + if (x_7(D) != y_8(D))
> + goto __BB8;
> + else
> + goto __BB10;
> +
> + __BB(8):
> + goto __BB9;
> +
> + __BB(9):
> + a_12 = __PHI (__BB8: a_4, __BB10: a_5);
> + goto __BB10;
> +
> + __BB(10,loop_header(1)):
> + a_5 = __PHI (__BB7: a_4, __BB9: a_12);
> +label2:
> + _1 = y_8(D) * 2;
> + if (x_7(D) == _1)
> + goto __BB9;
> + else
> + goto __BB11;
> +
> + __BB(11):
> + a_6 = __PHI (__BB5: a_3, __BB10: a_5);
> + _2 = x_7(D) * 3;
> + if (y_8(D) == _2)
> + goto __BB7;
> + else
> + goto __BB12;
> +
> + __BB(12):
> + _13 = 0;
> + return _13;
> +
> +}
> +
> +
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 09e6ada5b2f..94a48461590 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -399,6 +399,7 @@ extern gimple_opt_pass *make_pass_iv_optimize
> (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_sccopy (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_split_paths (gcc::context *ctxt);
> extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
> --
> 2.43.0
>