This has changed quite a bit since the previous revision (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01484.html), mostly due to Ada and specifically Ada on ARM.
I didn't find a good alternative to scanning for constant-pool accesses "as we go" through the function, and although I didn't find any of these accesses being disqualified after we'd found them in C, some awkward constant-pool expressions in Ada forced me to disqualify them for a new reason (inability to constant-propagate). Hence, I introduced a new bitmap of disqualified_constants, rather than an additional pass through the function. Next, the approach of using a replacement_expr (with tho constant value) in place of the old replacement_decl, also failed (the decl could be used in an expression where substituting in the expr produced ill-formed gimple). Hence, constant-pool entries now use replacement_decls just like everything else, for which I generate initializers in initialize_constant_pool_replacements. However, I was able to avoid generating the replacement constants early, and to do so late in initialize_constant_pool_replacements; the main trick here was to use fold_array_ctor_reference, kindly pointed out by Richie :), to fold the MEM_REFs built in analyze_access_subtree. Finally, I found completely_scalarize was still failing to fold all the constant expressions, because Ada was putting VIEW_CONVERT_EXPRs in the constant pool, and fold-const.c did not deal with ARRAY_REFs of these. However, requiring completely_scalarize to be able to fold everything, seemed fragile, instead I decoupled from fold-const by allowing to fail. With these changes, ssa-dom-cse-2.c is fixed on all platforms with appropriate --param. There are still a few remaining test failures, on AArch64: gcc.dg/guality/pr54970.c -O1 line 15 a[0] == 1 gcc.dg/guality/pr54970.c -O1 line 20 a[0] == 1 gcc.dg/guality/pr54970.c -O1 line 25 a[0] == 1 (also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os). ...which I'm working on. I also tested with the hack at https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01483.html . This revealed one new failure in advsimd-intrinsics/vldX.c on AArch64 (fixed by https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02777.html), and fixed a bunch of guality failures on ARM: gcc.dg/guality/pr54970.c -O1 line 31 a[0] == 4 gcc.dg/guality/pr54970.c -O1 line 36 a[0] == 4 gcc.dg/guality/pr54970.c -O1 line 45 a[0] == 4 gcc.dg/guality/pr54970.c -O1 line 45 p[-2] == 4 gcc.dg/guality/pr54970.c -O1 line 45 q[-1] == 4 (also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os) --Alan gcc/ChangeLog: * tree-sra.c (disqualified_constants): New. (sra_initialize): Initialize it. (sra_deinitialize): Deallocate it. (disqualify_candidate): Set bit in disqualified_constants. (subst_constant_pool_initial): New. (create_access): Scan for constant-pool entries as we go. (scalarizable_type_p): Disallow types containing placeholders. (completely_scalarize): Return bool to allow failure. (scalarize_elem): Likewise; check we can generate constant replacements. (maybe_add_sra_candidate): Allow constant-pool entries. (analyze_access_subtree): Checking-assert that we can fold any refs built for constant-pool entries. (analyze_all_variable_accesses): Deal with completely_scalarize failing. (initialize_constant_pool_replacements): New. (sra_modify_function_body): Call previous. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add sra-max-scalarization-size param, remove xfail. --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c | 7 +- gcc/tree-sra.c | 221 ++++++++++++++++++++++++-- 2 files changed, 208 insertions(+), 20 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c index 9eccdc9..b13d583 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */ +/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param sra-max-scalarization-size-Ospeed=32" } */ int foo () @@ -17,7 +17,4 @@ foo () /* After late unrolling the above loop completely DOM should be able to optimize this to return 28. */ -/* See PR63679 and PR64159, if the target forces the initializer to memory then - DOM is not able to perform this optimization. */ - -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */ +/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */ diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 358db79..1920265 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -90,6 +90,7 @@ along with GCC; see the file COPYING3. If not see #include "gimple-iterator.h" #include "gimplify-me.h" #include "gimple-walk.h" +#include "gimple-fold.h" #include "tree-cfg.h" #include "flags.h" #include "insn-config.h" @@ -336,6 +337,10 @@ candidate (unsigned uid) those which cannot be (because they are and need be used as a whole). */ static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap; +/* Bitmap of candidates in the constant pool, which cannot be scalarized + because this would produce non-constant expressions (e.g. Ada). */ +static bitmap disqualified_constants; + /* Obstack for creation of fancy names. */ static struct obstack name_obstack; @@ -658,6 +663,7 @@ sra_initialize (void) (vec_safe_length (cfun->local_decls) / 2); should_scalarize_away_bitmap = BITMAP_ALLOC (NULL); cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL); + disqualified_constants = BITMAP_ALLOC (NULL); gcc_obstack_init (&name_obstack); base_access_vec = new hash_map<tree, auto_vec<access_p> >; memset (&sra_stats, 0, sizeof (sra_stats)); @@ -676,6 +682,7 @@ sra_deinitialize (void) candidates = NULL; BITMAP_FREE (should_scalarize_away_bitmap); BITMAP_FREE (cannot_scalarize_away_bitmap); + BITMAP_FREE (disqualified_constants); access_pool.release (); assign_link_pool.release (); obstack_free (&name_obstack, NULL); @@ -690,6 +697,8 @@ disqualify_candidate (tree decl, const char *reason) { if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl))) candidates->remove_elt_with_hash (decl, DECL_UID (decl)); + if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl)) + bitmap_set_bit (disqualified_constants, DECL_UID (decl)); if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -841,8 +850,76 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size) return access; } +/* Given EXPR a chain of COMPONENT_REFs and/or ARRAY_REFs built around a + constant-pool declaration VAR, builds a tree like EXPR but with VAR replaced + by its DECL_INITIAL, and tries to fold it to a constant. If folding succeeds + then return the folded tree, else NULL_TREE. */ + +static tree +subst_constant_pool_initial (tree expr, tree var) +{ + if (TREE_CODE (expr) == VAR_DECL) + { + gcc_assert (DECL_IN_CONSTANT_POOL (expr)); + gcc_assert (expr == var); + return DECL_INITIAL (expr); + } + if (TREE_CODE (expr) == COMPONENT_REF) + { + tree field = TREE_OPERAND (expr, 1); + gcc_assert (TREE_CODE (field) == FIELD_DECL); + gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE); + tree record = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var); + if (!record) + return NULL_TREE; + tree ref = build3 (COMPONENT_REF, TREE_TYPE (expr), + record, field, NULL_TREE); + tree res = fold (ref); + if (res != ref) + return res; + } + else if (TREE_CODE (expr) == ARRAY_REF) + { + tree array = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var); + if (!array) + return NULL_TREE; + tree ref = build4 (ARRAY_REF, TREE_TYPE (expr), + array, + TREE_OPERAND (expr, 1), + TREE_OPERAND (expr, 2), + TREE_OPERAND (expr, 3)); + /* Ada (at least) builds ARRAY_REFs around strings. */ + if (tree folded = fold_read_from_constant_string (ref)) + return folded; + tree res = fold (ref); + if (res != ref) + return res; + } + else if (TREE_CODE (expr) == MEM_REF) + { + gcc_assert (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR); + tree type = TREE_TYPE (expr); + tree sz = TYPE_SIZE (TREE_TYPE (expr)); + tree val = TREE_OPERAND (TREE_OPERAND (expr, 0), 0); + val = subst_constant_pool_initial (val, var); + if (TREE_CODE (val) != CONSTRUCTOR + || !tree_fits_uhwi_p (TREE_OPERAND (expr, 1)) + || !tree_fits_uhwi_p (sz)) + return NULL_TREE; + unsigned HOST_WIDE_INT offset = + tree_to_uhwi (TREE_OPERAND (expr, 1)) * BITS_PER_UNIT; + return fold_ctor_reference (type, val, offset, tree_to_uhwi (sz), var); + } + + /* Unknown expression, or could not constant-fold. */ + return NULL_TREE; +} + +static bool maybe_add_sra_candidate (tree var); + /* Create and insert access for EXPR. Return created access, or NULL if it is - not possible. */ + not possible. Also scan for uses of constant pool as we go along and add + to candidates. */ static struct access * create_access (tree expr, gimple *stmt, bool write) @@ -865,6 +942,32 @@ create_access (tree expr, gimple *stmt, bool write) else ptr = false; + /* For constant-pool entries, check we can substitute the constant value. */ + if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base) + && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA)) + { + gcc_assert (!write); + if (bitmap_bit_p (disqualified_constants, DECL_UID (base))) + return NULL; + tree repl = subst_constant_pool_initial (expr, base); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Trying to scalarize constant-pool "); + print_generic_expr (dump_file, base, 0); + fprintf (dump_file, "\n In expression: "); + print_generic_expr (dump_file, expr, 0); + fprintf (dump_file, "\n Constant replacement: "); + print_generic_expr (dump_file, repl, 0); + fprintf (dump_file, "\n"); + } + /* If we can't resolve the expression to a constant, we risk creating + a malformed expression (e.g. Ada testsuite, ACATS chapter C3). */ + if (!repl || !TREE_CONSTANT (repl)) + disqualify_candidate (base, "Cannot constant-propagate out of pool."); + else + maybe_add_sra_candidate (base); + } + if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base))) return NULL; @@ -923,6 +1026,8 @@ static bool scalarizable_type_p (tree type) { gcc_assert (!is_gimple_reg_type (type)); + if (type_contains_placeholder_p (type)) + return false; switch (TREE_CODE (type)) { @@ -970,15 +1075,19 @@ scalarizable_type_p (tree type) } } -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree); +static bool scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree); /* Create total_scalarization accesses for all scalar fields of a member of type DECL_TYPE conforming to scalarizable_type_p. BASE must be the top-most VAR_DECL representing the variable; within that, OFFSET locates the member and REF must be the memory reference expression for - the member. */ + the member. -static void + Return TRUE if successful; may fail in the case of constant-pool entry BASE + whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or + COMPONENT_REF expressions. */ + +static bool completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) { switch (TREE_CODE (decl_type)) @@ -991,10 +1100,11 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) tree ft = TREE_TYPE (fld); tree nref = build3 (COMPONENT_REF, ft, ref, fld, NULL_TREE); - scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)), nref, - ft); + if (!scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)), + nref, ft)) + return false; } - break; + return true; case ARRAY_TYPE: { tree elemtype = TREE_TYPE (decl_type); @@ -1026,12 +1136,13 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) ref, size_int (idx + min), NULL_TREE, NULL_TREE); int el_off = offset + idx * el_size; - scalarize_elem (base, el_off, el_size, nref, elemtype); + if (!scalarize_elem (base, el_off, el_size, nref, elemtype)) + return false; } while (++idx <= lenlessone); } } - break; + return true; default: gcc_unreachable (); } @@ -1040,22 +1151,32 @@ completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree ref) /* Create total_scalarization accesses for a member of type TYPE, which must satisfy either is_gimple_reg_type or scalarizable_type_p. BASE must be the top-most VAR_DECL representing the variable; within that, POS and SIZE locate - the member and REF must be the reference expression for it. */ + the member and REF must be the reference expression for it. -static void + Return TRUE if successful; may fail in the case of constant-pool entry BASE + whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or + COMPONENT_REF expressions. */ + +static bool scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, tree ref, tree type) { if (is_gimple_reg_type (type)) { + if (TREE_CODE (base) == VAR_DECL + && DECL_IN_CONSTANT_POOL (base) + && subst_constant_pool_initial (ref, base) == NULL_TREE) + return false; struct access *access = create_access_1 (base, pos, size); access->expr = ref; access->type = type; access->grp_total_scalarization = 1; /* Accesses for intraprocedural SRA can have their stmt NULL. */ + + return true; } - else - completely_scalarize (base, type, pos, ref); + + return completely_scalarize (base, type, pos, ref); } /* Create a total_scalarization access for VAR as a whole. VAR must be of a @@ -1843,7 +1964,12 @@ maybe_add_sra_candidate (tree var) reject (var, "not aggregate"); return false; } - if (needs_to_live_in_memory (var)) + /* Allow constant-pool entries (that "need to live in memory") + unless we are doing IPA SRA. */ + if (needs_to_live_in_memory (var) + && (sra_mode == SRA_MODE_EARLY_IPA + || TREE_CODE (var) != VAR_DECL + || !DECL_IN_CONSTANT_POOL (var))) { reject (var, "needs to live in memory"); return false; @@ -2343,6 +2469,10 @@ analyze_access_subtree (struct access *root, struct access *parent, (unsigned) root->offset, (unsigned) root->size); fprintf (dump_file, " to an integer.\n"); } + gcc_checking_assert (TREE_CODE (root->base) != VAR_DECL + || !DECL_IN_CONSTANT_POOL (root->base) + || subst_constant_pool_initial (root->expr, + root->base)); } root->grp_to_be_replaced = 1; @@ -2604,8 +2734,17 @@ analyze_all_variable_accesses (void) if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) <= max_scalarization_size) { + vec<access_p> &accesses = base_access_vec->get_or_insert (var); + unsigned orig_count = accesses.length (); + if (!completely_scalarize (var, TREE_TYPE (var), 0, var)) + { + /* Failed. Discard any accesses created during attempt. */ + for (unsigned i = orig_count; i < accesses.length (); i++) + access_pool.remove (accesses[i]); + accesses.truncate (orig_count); + continue; + } create_total_scalarization_access (var); - completely_scalarize (var, TREE_TYPE (var), 0, var); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Will attempt to totally scalarize "); @@ -3480,6 +3619,56 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) } } +static void +initialize_constant_pool_replacements (void) +{ + gimple_seq seq = NULL; + gimple_stmt_iterator gsi = gsi_start (seq); + bitmap_iterator bi; + unsigned i; + + EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi) + if (bitmap_bit_p (should_scalarize_away_bitmap, i) + && !bitmap_bit_p (cannot_scalarize_away_bitmap, i)) + { + tree var = candidate (i); + if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var)) + continue; + vec<access_p> *access_vec = get_base_access_vector (var); + if (!access_vec) + continue; + for (unsigned i = 0; i < access_vec->length (); i++) + { + struct access *access = (*access_vec)[i]; + tree val = subst_constant_pool_initial (access->expr, access->base); + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Expression "); + print_generic_expr (dump_file, access->expr, 0); + fprintf (dump_file, " replaced with "); + print_generic_expr (dump_file, access->replacement_decl, 0); + fprintf (dump_file, " initialized to "); + print_generic_expr (dump_file, val, 0); + fprintf (dump_file, "\n"); + if (!access->replacement_decl) + dump_access (dump_file, access, true); + } + if (!access->replacement_decl) + continue; + gcc_assert (val); + gassign *stmt = gimple_build_assign ( + get_access_replacement (access), val); + gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); + update_stmt (stmt); + } + } + + seq = gsi_seq (gsi); + if (seq) + gsi_insert_seq_on_edge_immediate ( + single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq); +} + /* Traverse the function body and all modifications as decided in analyze_all_variable_accesses. Return true iff the CFG has been changed. */ @@ -3490,6 +3679,8 @@ sra_modify_function_body (void) bool cfg_changed = false; basic_block bb; + initialize_constant_pool_replacements (); + FOR_EACH_BB_FN (bb, cfun) { gimple_stmt_iterator gsi = gsi_start_bb (bb); -- 1.9.1