Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > On Fri, Nov 12, 2021 at 7:05 PM Richard Sandiford via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> This patch is a prerequisite for a later one. At the moment, >> if-conversion converts predicated POINTER_PLUS_EXPRs into >> non-wrapping forms, which for: >> >> … = base + offset >> >> becomes: >> >> tmp = (unsigned long) base >> … = tmp + offset >> >> It then hoists these conversions out of the loop where possible. >> >> However, because “base” is a valid gimple operand, there can be >> multiple POINTER_PLUS_EXPRs with the same base, which can in turn >> lead to multiple instances of the same conversion. The later VN pass >> is (and I think needs to be) restricted to the new if-converted code, >> whereas here we're deliberately inserting the conversions before the >> .LOOP_VECTORIZED condition: >> >> /* If we versioned loop then make sure to insert invariant >> stmts before the .LOOP_VECTORIZED check since the vectorizer >> will re-use that for things like runtime alias versioning >> whose condition can end up using those invariants. */ >> >> We can therefore enter the vectoriser with redundant conversions. >> >> The easiest fix seemed to be to defer the hoisting until after VN. >> This catches other hoisting opportunities too. >> >> Hoisting the code from the (artificial) loop in pr99102.c means >> that it's no longer worth vectorising. The patch forces vectorisation >> instead of relying on the cost model. >> >> The patch also reverts pr87007-4.c and pr87007-5.c back to their >> original forms, undoing changes in 783dc66f9ccb0019c3dad. >> The code at the time the tests were added was: >> >> testl %edi, %edi >> je .L10 >> vxorps %xmm1, %xmm1, %xmm1 >> vsqrtsd d3(%rip), %xmm1, %xmm0 >> vsqrtsd d2(%rip), %xmm1, %xmm1 >> ... >> .L10: >> ret >> >> with the operations being hoisted, and the vxorps was specifically >> wanted (compared to the previous code). This patch restores the code >> to that form, with the hoisted operations and the vxorps. >> >> Regstrapped on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> Richard >> >> >> gcc/ >> * tree-if-conv.c: Include tree-eh.h. >> (predicate_statements): Remove pe argument. Don't hoist >> statements here. >> (combine_blocks): Remove pe argument. >> (ifcvt_can_hoist, ifcvt_can_hoist_further): New functions. >> (ifcvt_hoist_invariants): Likewise. >> (tree_if_conversion): Update call to combine_blocks. Call >> ifcvt_hoist_invariants after VN. >> >> gcc/testsuite/ >> * gcc.dg/vect/pr99102.c: Add -fno-vect-cost-model. >> >> Revert: >> >> 2020-09-09 Richard Biener <rguent...@suse.de> >> >> * gcc.target/i386/pr87007-4.c: Adjust. >> * gcc.target/i386/pr87007-5.c: Likewise. >> --- >> gcc/testsuite/gcc.dg/vect/pr99102.c | 2 +- >> gcc/testsuite/gcc.target/i386/pr87007-4.c | 2 +- >> gcc/testsuite/gcc.target/i386/pr87007-5.c | 2 +- >> gcc/tree-if-conv.c | 122 ++++++++++++++++++++-- >> 4 files changed, 114 insertions(+), 14 deletions(-) >> >> diff --git a/gcc/testsuite/gcc.dg/vect/pr99102.c >> b/gcc/testsuite/gcc.dg/vect/pr99102.c >> index 6c1a13f0783..0d030d15c86 100644 >> --- a/gcc/testsuite/gcc.dg/vect/pr99102.c >> +++ b/gcc/testsuite/gcc.dg/vect/pr99102.c >> @@ -1,4 +1,4 @@ >> -/* { dg-options "-O2 -ftree-vectorize -fdump-tree-vect-details" } */ >> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model >> -fdump-tree-vect-details" } */ >> /* { dg-additional-options "-msve-vector-bits=256" { target >> aarch64_sve256_hw } } */ >> long a[44]; >> short d, e = -7; >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-4.c >> b/gcc/testsuite/gcc.target/i386/pr87007-4.c >> index 9c4b8005af3..e91bdcbac44 100644 >> --- a/gcc/testsuite/gcc.target/i386/pr87007-4.c >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-4.c >> @@ -15,4 +15,4 @@ foo (int n, int k) >> d1 = ceil (d3); >> } >> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c >> b/gcc/testsuite/gcc.target/i386/pr87007-5.c >> index e4d956a5d7f..20d13cf650b 100644 >> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c >> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c >> @@ -15,4 +15,4 @@ foo (int n, int k) >> d1 = sqrt (d3); >> } >> >> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */ >> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */ >> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c >> index e88ddc9f788..0ad557a2f4d 100644 >> --- a/gcc/tree-if-conv.c >> +++ b/gcc/tree-if-conv.c >> @@ -121,6 +121,7 @@ along with GCC; see the file COPYING3. If not see >> #include "tree-cfgcleanup.h" >> #include "tree-ssa-dse.h" >> #include "tree-vectorizer.h" >> +#include "tree-eh.h" >> >> /* Only handle PHIs with no more arguments unless we are asked to by >> simd pragma. */ >> @@ -2496,7 +2497,7 @@ predicate_rhs_code (gassign *stmt, tree mask, tree >> cond, >> */ >> >> static void >> -predicate_statements (loop_p loop, edge pe) >> +predicate_statements (loop_p loop) >> { >> unsigned int i, orig_loop_num_nodes = loop->num_nodes; >> auto_vec<int, 1> vect_sizes; >> @@ -2597,13 +2598,7 @@ predicate_statements (loop_p loop, edge pe) >> { >> gassign *stmt2 = as_a <gassign *> (gsi_stmt (gsi2)); >> gsi_remove (&gsi2, false); >> - /* Make sure to move invariant conversions out of the >> - loop. */ >> - if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2)) >> - && expr_invariant_in_loop_p (loop, >> - gimple_assign_rhs1 >> (stmt2))) >> - gsi_insert_on_edge_immediate (pe, stmt2); >> - else if (first) >> + if (first) >> { >> gsi_insert_before (&gsi, stmt2, GSI_NEW_STMT); >> first = false; >> @@ -2684,7 +2679,7 @@ remove_conditions_and_labels (loop_p loop) >> blocks. Replace PHI nodes with conditional modify expressions. */ >> >> static void >> -combine_blocks (class loop *loop, edge pe) >> +combine_blocks (class loop *loop) >> { >> basic_block bb, exit_bb, merge_target_bb; >> unsigned int orig_loop_num_nodes = loop->num_nodes; >> @@ -2697,7 +2692,7 @@ combine_blocks (class loop *loop, edge pe) >> predicate_all_scalar_phis (loop); >> >> if (need_to_predicate || need_to_rewrite_undefined) >> - predicate_statements (loop, pe); >> + predicate_statements (loop); >> >> /* Merge basic blocks. */ >> exit_bb = NULL; >> @@ -3181,6 +3176,109 @@ ifcvt_local_dce (class loop *loop) >> } >> } >> >> +/* Return true if STMT can be hoisted from if-converted loop LOOP. */ >> + >> +static bool >> +ifcvt_can_hoist (class loop *loop, gimple *stmt) >> +{ >> + if (auto *call = dyn_cast<gcall *> (stmt)) >> + { >> + if (gimple_call_internal_p (call) >> + && internal_fn_mask_index (gimple_call_internal_fn (call)) >= 0) >> + return false; >> + } >> + else if (auto *assign = dyn_cast<gassign *> (stmt)) >> + { >> + if (gimple_assign_rhs_code (assign) == COND_EXPR) >> + return false; >> + } >> + else >> + return false; >> + >> + if (gimple_has_side_effects (stmt) >> + || gimple_could_trap_p (stmt) >> + || stmt_could_throw_p (cfun, stmt) >> + || gimple_vdef (stmt) >> + || gimple_vuse (stmt)) >> + return false; >> + >> + int num_args = gimple_num_args (stmt); >> + for (int i = 0; i < num_args; ++i) >> + if (!expr_invariant_in_loop_p (loop, gimple_arg (stmt, i))) >> + return false; >> + >> + return true; >> +} >> + >> +/* PE is the preferred hoisting edge selected by tree_if_conversion, which >> + s known to be different from (and to dominate) the preheader edge of the >> + if-converted loop. We already know that STMT can be inserted on the loop >> + preheader edge. Return true if we prefer to insert it on PE instead. */ >> + >> +static bool >> +ifcvt_can_hoist_further (edge pe, gimple *stmt) >> +{ >> + /* As explained in tree_if_conversion, we want to hoist invariant >> + conversions further so that they can be reused by alias analysis. */ >> + auto *assign = dyn_cast<gassign *> (stmt); >> + if (assign >> + && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (assign))) >> + { >> + tree rhs = gimple_assign_rhs1 (assign); >> + if (is_gimple_min_invariant (rhs)) >> + return true; >> + >> + if (TREE_CODE (rhs) == SSA_NAME) >> + { >> + basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (rhs)); >> + if (!def_bb || dominated_by_p (CDI_DOMINATORS, pe->dest, def_bb)) >> + return true; >> + } >> + } >> + return false; >> +} >> + >> +/* Hoist invariant statements from LOOP. PE is the preferred edge for >> + hoisting conversions, as selected by tree_if_conversion; see there >> + for details. */ >> + >> +static void >> +ifcvt_hoist_invariants (class loop *loop, edge pe) >> +{ >> + gimple_stmt_iterator hoist_gsi = {}; >> + gimple_stmt_iterator hoist_gsi_pe = {}; >> + unsigned int num_blocks = loop->num_nodes; >> + basic_block *body = get_loop_body (loop); >> + for (unsigned int i = 0; i < num_blocks; ++i) >> + for (gimple_stmt_iterator gsi = gsi_start_bb (body[i]); !gsi_end_p >> (gsi);) >> + { >> + gimple *stmt = gsi_stmt (gsi); >> + if (ifcvt_can_hoist (loop, stmt)) >> + { >> + /* Once we've hoisted one statement, insert other statements >> + after it. */ >> + edge e = loop_preheader_edge (loop); >> + gimple_stmt_iterator *hoist_gsi_ptr = &hoist_gsi; >> + if (e != pe && ifcvt_can_hoist_further (pe, stmt)) > > One issue with hoisting to loop_preheader_edge instead of 'pe' > is that eventually the loop versioning code will pick up the defs > through the versioning condition and that will break because the > versioning condition will be inserted in place of the > IFN_VECTORIZED_LOOP, which is _before_ the preheader. > The code basically expects the preheader to be empty.
Do you mean that the versioning-for-aliases code assumes that all loop-invariant definitions are available in the versioning condition? I hadn't realised that's what the comment was saying. It sounded more like an optimisation. Thanks, Richard > I don't see how ifcvt_can_hoist_further avoids this for > non-CONVERT_EXPRs. > > So I don't see how this is actually safe? It's probably moderately > easy to trigger issues when you disable both PRE and > loop invariant motion before if-conversion/vectorization. > > Maybe doing something similar as cse_and_gimplify_to_preheader > done in the vectorizer helps for the original problem. Or alternatively > do the hoisting during loop versioning ... > > Richard. > >> + { >> + e = pe; >> + hoist_gsi_ptr = &hoist_gsi_pe; >> + } >> + gsi_remove (&gsi, false); >> + if (hoist_gsi_ptr->ptr) >> + gsi_insert_after (hoist_gsi_ptr, stmt, GSI_NEW_STMT); >> + else >> + { >> + gsi_insert_on_edge_immediate (e, stmt); >> + *hoist_gsi_ptr = gsi_for_stmt (stmt); >> + } >> + continue; >> + } >> + gsi_next (&gsi); >> + } >> + free (body); >> +} >> + >> /* If-convert LOOP when it is legal. For the moment this pass has no >> profitability analysis. Returns non-zero todo flags when something >> changed. */ >> @@ -3275,7 +3373,7 @@ tree_if_conversion (class loop *loop, vec<gimple *> >> *preds) >> /* Now all statements are if-convertible. Combine all the basic >> blocks into one huge basic block doing the if-conversion >> on-the-fly. */ >> - combine_blocks (loop, pe); >> + combine_blocks (loop); >> >> /* Perform local CSE, this esp. helps the vectorizer analysis if loads >> and stores are involved. CSE only the loop body, not the entry >> @@ -3297,6 +3395,8 @@ tree_if_conversion (class loop *loop, vec<gimple *> >> *preds) >> ifcvt_local_dce (loop); >> BITMAP_FREE (exit_bbs); >> >> + ifcvt_hoist_invariants (loop, pe); >> + >> todo |= TODO_cleanup_cfg; >> >> cleanup: >> -- >> 2.25.1 >>