Loop vecotorization can not optimize following case due to SCEV is not affine failure (i+offset may overflow):
int A[1024 * 2]; int foo (unsigned offset, unsigned N) { int sum = 0; for (unsigned i = 0; i < N; i++) sum += A[i + offset]; return sum; } Actually, niter pass can find nonwrapping induction variables (i.e., i + offset can not overflow) by inferring from undefined behaviors like array access (see record_nonwrapping_iv). But this information is not shared to SCEV yet. This patch adds a nonwrapping flag to the chrec tree, which allows SCEV to re-use it to pass the nonwrapping checks like scev_probably_wraps_p, and finaly loop vectorization could succeed. The new flag is defined as CHREC_NOWRAP(tree), and the dump info is changed from "{offset, +, 1}_1" -> "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to set and check the flag respectively. However, an extra problem is caused by resetting SCEV cache (i.e., scev_reset or reset_scev_htab), which may not be synchronized with the calling of free_numbers_of_iterations_estimates, which set the loop->estimate_state to EST_NOT_COMPUTED and make sure the above inferring from array access is called. In other words, the nonwrapping flag could be cleared and lost by resetting SCEV cache if the loop->estimate_state is not reset. E.g., gimple_duplicate_loop_body_to_header_edge/flush_ssaname_freelist, which calls scev_reset/scev_reset_htab to clear the SCEV cache, but the estimate_state is still kept as EST_AVAILABLE and the flag will not be set in loop vectorization. This patch uses a simple fix by calling free_numbers_of_iterations_estimates in vect_analyze_loop, which will make sure the flag is always set propriately in loop vectorization. This fix is a bit ad-hoc (works for loop vectorization only), if there is more reasonable method, I will revert the simple fix and try that. This patch is bootstrapped and tested on aarch64-linux-gnu with no regressions. OK for trunk? --- The patch is as following: [PATCH] SCEV: extend the chrec tree with a nonwrapping flag [PR112774] The flag is defined as CHREC_NOWRAP(tree), and will be dumped from "{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to set and check the flag respectively. As resetting the SCEV cache (i.e., the chrec trees) may not reset the loop->estimate_state, free_numbers_of_iterations_estimates is called explicitly in loop vectorization to make sure the flag can be calculated propriately by niter. gcc/ChangeLog: PR tree-optimization/112774 * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be printed with additional <nw> info. * tree-scalar-evolution.cc: add record_nonwrapping_chrec and nonwrapping_chrec_p to set and check the new flag respectively. * tree-scalar-evolution.h: Likewise. * tree-ssa-loop-niter.cc (idx_infer_loop_bounds, infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness, scev_probably_wraps_p): call record_nonwrapping_chrec before record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and return false from scev_probably_wraps_p. * tree-vect-loop.cc (vect_analyze_loop): call free_numbers_of_iterations_estimates explicitly. * gcc/tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used to represent the nonwrapping info. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/scev-16.c: New test. --- gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 17 +++++++++++++++++ gcc/tree-pretty-print.cc | 2 +- gcc/tree-scalar-evolution.cc | 24 ++++++++++++++++++++++++ gcc/tree-scalar-evolution.h | 2 ++ gcc/tree-ssa-loop-niter.cc | 21 ++++++++++++++++----- gcc/tree-vect-loop.cc | 4 ++++ gcc/tree.h | 8 +++++--- 7 files changed, 69 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c new file mode 100644 index 00000000000..96ea36e4c65 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-16.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-vect-scev" } */ + +int A[1024 * 2]; + +int foo (unsigned offset, unsigned N) +{ + int sum = 0; + + for (unsigned i = 0; i < N; i++) + sum += A[i + offset]; + + return sum; +} + +/* { dg-final { scan-tree-dump "vec_transform_loop" "vect" } } */ +/* { dg-final { scan-tree-dump-not "missed: failed: evolution of offset is not affine" "vect" } } */ diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc index 1fadd752d05..0dabb6d1580 100644 --- a/gcc/tree-pretty-print.cc +++ b/gcc/tree-pretty-print.cc @@ -3488,7 +3488,7 @@ dump_generic_node (pretty_printer *pp, tree node, int spc, dump_flags_t flags, dump_generic_node (pp, CHREC_LEFT (node), spc, flags, false); pp_string (pp, ", +, "); dump_generic_node (pp, CHREC_RIGHT (node), spc, flags, false); - pp_string (pp, "}_"); + pp_string (pp, !CHREC_NOWRAP (node) ? "}_" : "}<nw>_"); pp_scalar (pp, "%u", CHREC_VARIABLE (node)); is_stmt = false; break; diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc index 065bcd0743d..a9112572e0c 100644 --- a/gcc/tree-scalar-evolution.cc +++ b/gcc/tree-scalar-evolution.cc @@ -2050,6 +2050,30 @@ analyze_scalar_evolution (class loop *loop, tree var) return res; } +/* If CHREC doesn't overflow, set the nonwrapping flag. */ + +void record_nonwrapping_chrec (tree chrec) +{ + CHREC_NOWRAP(chrec) = 1; + + if (dump_file && (dump_flags & TDF_SCEV)) + { + fprintf (dump_file, "(record_nonwrapping_chrec: "); + print_generic_expr (dump_file, chrec); + fprintf (dump_file, ")\n"); + } +} + +/* Return true if CHREC's nonwrapping flag is set. */ + +bool nonwrapping_chrec_p (tree chrec) +{ + if (!chrec || TREE_CODE(chrec) != POLYNOMIAL_CHREC) + return false; + + return CHREC_NOWRAP(chrec); +} + /* Analyzes and returns the scalar evolution of VAR address in LOOP. */ static tree diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h index a64ed78fe63..f57fde12ee2 100644 --- a/gcc/tree-scalar-evolution.h +++ b/gcc/tree-scalar-evolution.h @@ -43,6 +43,8 @@ extern bool simple_iv (class loop *, class loop *, tree, struct affine_iv *, bool); extern bool iv_can_overflow_p (class loop *, tree, tree, tree); extern tree compute_overall_effect_of_inner_loop (class loop *, tree); +extern void record_nonwrapping_chrec (tree); +extern bool nonwrapping_chrec_p (tree); /* Returns the basic block preceding LOOP, or the CFG entry block when the loop is function's body. */ diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc index 2098bef9a97..d465e0ed7e1 100644 --- a/gcc/tree-ssa-loop-niter.cc +++ b/gcc/tree-ssa-loop-niter.cc @@ -4206,11 +4206,15 @@ idx_infer_loop_bounds (tree base, tree *idx, void *dta) /* If access is not executed on every iteration, we must ensure that overlow may not make the access valid later. */ - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt)) - && scev_probably_wraps_p (NULL_TREE, - initial_condition_in_loop_num (ev, loop->num), - step, data->stmt, loop, true)) - upper = false; + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))) + { + if (scev_probably_wraps_p (NULL_TREE, + initial_condition_in_loop_num (ev, loop->num), + step, data->stmt, loop, true)) + upper = false; + } + else + record_nonwrapping_chrec (ev); record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, upper); return true; @@ -4324,6 +4328,7 @@ infer_loop_bounds_from_pointer_arith (class loop *loop, gimple *stmt) if (flag_delete_null_pointer_checks && int_cst_value (low) == 0) low = build_int_cstu (TREE_TYPE (low), TYPE_ALIGN_UNIT (TREE_TYPE (type))); + record_nonwrapping_chrec (scev); record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true); } @@ -4371,6 +4376,7 @@ infer_loop_bounds_from_signedness (class loop *loop, gimple *stmt) high = wide_int_to_tree (type, r.upper_bound ()); } + record_nonwrapping_chrec (scev); record_nonwrapping_iv (loop, base, step, stmt, low, high, false, true); } @@ -5505,6 +5511,11 @@ scev_probably_wraps_p (tree var, tree base, tree step, if (loop_exits_before_overflow (base, step, at_stmt, loop)) return false; + /* Check the nonwrapping flag, which may be set by niter analysis (e.g., the + above loop exits before overflow). */ + if (var && nonwrapping_chrec_p (analyze_scalar_evolution (loop, var))) + return false; + /* At this point we still don't have a proof that the iv does not overflow: give up. */ return true; diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index dd584ab4a42..6261cd1be1d 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -3570,6 +3570,10 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) analysis are done under the assumptions. */ loop_constraint_set (loop, LOOP_C_FINITE); } + else + /* Clear the existing niter information to make sure the nonwrapping flag + will be calculated and set propriately. */ + free_numbers_of_iterations_estimates (loop); auto_vector_modes vector_modes; /* Autodetect first vector size we try. */ diff --git a/gcc/tree.h b/gcc/tree.h index 086b55f0375..59af8920f02 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1438,9 +1438,11 @@ class auto_suppress_location_wrappers #define COND_EXPR_ELSE(NODE) (TREE_OPERAND (COND_EXPR_CHECK (NODE), 2)) /* Accessors for the chains of recurrences. */ -#define CHREC_LEFT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0) -#define CHREC_RIGHT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1) -#define CHREC_VARIABLE(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var +#define CHREC_LEFT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 0) +#define CHREC_RIGHT(NODE) TREE_OPERAND (POLYNOMIAL_CHREC_CHECK (NODE), 1) +#define CHREC_VARIABLE(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.u.chrec_var +/* Nonzero if this chrec doesn't overflow (i.e., nonwrapping). */ +#define CHREC_NOWRAP(NODE) POLYNOMIAL_CHREC_CHECK (NODE)->base.nothrow_flag /* LABEL_EXPR accessor. This gives access to the label associated with the given label expression. */ -- 2.34.1