https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100756
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2022-10-21 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Summary|vect: Superfluous epilog |[12/13 Regression] vect: |created on s390x |Superfluous epilog created | |on s390x CC| |amacleod at redhat dot com Target Milestone|--- |12.3 Keywords| |missed-optimization --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- I think the issue is that we now have <bb 2> [local count: 118111600]: _15 = n_8(D) * 4; if (n_8(D) > 0) goto <bb 5>; [89.00%] else goto <bb 7>; [11.00%] while we probably had <bb 2> [local count: 118111600]: _15 = n_8(D) * 4; if (_15 > 0) goto <bb 5>; [89.00%] else goto <bb 7>; [11.00%] before the change. Loop header copying applies VN to the copied blocks: Processing block 0: BB6 Value numbering stmt = i_9 = PHI <0(2)> Setting value number of i_9 to 0 (changed) Replaced redundant PHI node defining i_9 with 0 Value numbering stmt = result_14 = PHI <0(2)> Setting value number of result_14 to 0 (changed) Replaced redundant PHI node defining result_14 with 0 Value numbering stmt = _15 = n_8(D) * 4; Setting value number of _15 to _15 (changed) Making available beyond BB6 _15 for value _15 Value numbering stmt = if (_15 > i_9) Recording on edge 6->7 _15 gt_expr 0 == true Recording on edge 6->7 _15 le_expr 0 == false Recording on edge 6->7 _15 ne_expr 0 == true Recording on edge 6->7 _15 ge_expr 0 == true Recording on edge 6->7 _15 lt_expr 0 == false Recording on edge 6->7 _15 eq_expr 0 == false marking outgoing edge 6 -> 7 executable gimple_simplified to if (n_8(D) > 0) with <bb 2> [local count: 118111600]: _15 = n_8(D) * 4; if (n_8(D) > 0) goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] <bb 3> [local count: 955630225]: # i_16 = PHI <i_13(3), 0(2)> # result_17 = PHI <result_12(3), 0(2)> _1 = (long unsigned int) i_16; _2 = _1 * 4; _3 = a_11(D) + _2; _4 = *_3; result_12 = _4 + result_17; i_13 = i_16 + 1; _5 = n_8(D) * 4; if (_5 > i_13) goto <bb 3>; [89.00%] else goto <bb 4>; [11.00%] so it was a single use in the compare (because CSE only later introduces more uses through DOM). The niter code then ends up with maybe-zero as _15 <= 0 and a condition of n_8(D) > 0 it tries to simplify with tree_simplify_using_condition (called from simplify_using_initial_conditions). That old machinery would be a perfect candidate to be rewritten using path ranger, but in a somewhat extended mode that can "skip" diamonds, aka, the path just contains dominators of the loop entry edge on which we want to evaluate the _15 <= 0 condition. To make the old simplification code work we can do the following: diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc index 1e0f609d8b6..4ffcef4f4ff 100644 --- a/gcc/tree-ssa-loop-niter.cc +++ b/gcc/tree-ssa-loop-niter.cc @@ -2216,6 +2216,7 @@ expand_simple_operations (tree expr, tree stop, hash_map<tree, tree> &cache) case PLUS_EXPR: case MINUS_EXPR: + case MULT_EXPR: if (ANY_INTEGRAL_TYPE_P (TREE_TYPE (expr)) && TYPE_OVERFLOW_TRAPS (TREE_TYPE (expr))) return expr; but that can of course have unintended side-effects elsewhere (this function is also used by IVOPTs).