https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- For a testcase trying to show the issue: double U[1024]; double V[1024]; void foo (void) { for (unsigned i = 1; i < 1023; ++i) V[i] = U[i-1] + U[i] + U[i+1]; } we get from PRE (.optimized, w/ IVO disabled): <bb 2> [1.00%]: pretmp_19 = U[0]; pretmp_21 = U[1]; <bb 3> [99.00%]: # i_15 = PHI <_5(3), 1(2)> # prephitmp_20 = PHI <prephitmp_22(3), pretmp_19(2)> # prephitmp_22 = PHI <_6(3), pretmp_21(2)> # ivtmp_1 = PHI <ivtmp_18(3), 1022(2)> _5 = i_15 + 1; _6 = U[_5]; _17 = _6 + prephitmp_22; _7 = _17 + prephitmp_20; V[i_15] = _7; ivtmp_18 = ivtmp_1 + 4294967295; if (ivtmp_18 != 0) goto <bb 3>; [98.99%] else goto <bb 4>; [1.01%] <bb 4> [1.00%]: return; while predcom does the same transform but unrolls the loop: <bb 3> [50.00%]: # i_15 = PHI <1(2), _43(3)> # ivtmp_18 = PHI <1022(2), ivtmp_49(3)> # U_I_lsm0.3_30 = PHI <_33(2), _6(3)> # U_I_lsm1.4_31 = PHI <_34(2), _44(3)> # ivtmp_50 = PHI <1021(2), ivtmp_51(3)> _5 = i_15 + 1; _6 = U[_5]; _37 = U_I_lsm0.3_30 + U_I_lsm1.4_31; _7 = _6 + _37; V[i_15] = _7; _43 = i_15 + 2; _44 = U[_43]; _46 = U_I_lsm1.4_31 + _44; _47 = _6 + _46; V[_5] = _47; ivtmp_49 = ivtmp_18 + 4294967294; ivtmp_51 = ivtmp_50 + 4294967294; if (ivtmp_51 > 1) goto <bb 3>; [98.00%] else goto <bb 4>; [2.00%] register pressure created by both are the same. For the more complex testcase PRE misses the "combination chains" because we exclude them as Found partial redundancy for expression {plus_expr,_2,_3} (0012) Skipping insertion of phi for partial redundancy: Looks like an induction variable when we mitigate that PRE can handle all cases where association is correct. predictive commoning in addition to that does re-association of adds to enable more chains (so the mitigation doesn't help for the original testcase). Testcase that is helped this way: double U[1024], W[1024]; double V[1024]; void foo (void) { for (unsigned i = 1; i < 1023; ++i) V[i] = (U[i-1] + W[i-1]) + (U[i] + W[i]) + (U[i+1] + W[i+1]); } and PRE produces <bb 3> [99.00%]: # i_21 = PHI <_9(3), 1(2)> # prephitmp_43 = PHI <_23(3), _42(2)> # prephitmp_45 = PHI <prephitmp_43(3), _44(2)> # ivtmp_40 = PHI <ivtmp_38(3), 1022(2)> _9 = i_21 + 1; _10 = U[_9]; _11 = W[_9]; _23 = _10 + _11; _1 = _23 + prephitmp_43; _13 = _1 + prephitmp_45; V[i_21] = _13; ivtmp_38 = ivtmp_40 + 4294967295; if (ivtmp_38 != 0) goto <bb 3>; [98.99%] else goto <bb 4>; [1.01%] note how we PRE U[] + W[] rather than U[] and W[]. Index: gcc/tree-ssa-pre.c =================================================================== --- gcc/tree-ssa-pre.c (revision 245594) +++ gcc/tree-ssa-pre.c (working copy) @@ -3008,7 +3008,9 @@ insert_into_preds_of_block (basic_block EDGE_PRED (block, 1)->src); /* Induction variables only have one edge inside the loop. */ if ((firstinsideloop ^ secondinsideloop) - && expr->kind != REFERENCE) + && expr->kind != REFERENCE + && (expr->kind != NARY + || INTEGRAL_TYPE_P (PRE_EXPR_NARY (expr)->type))) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Skipping insertion of phi for partial redundancy: Looks like an induction variable\n"); for the original testcase in this bug this removes two IVs (but the IV detection is lame). What we mostly want to avoid here is creating IVs for derived IVs, it might be enough to disregard && expr->kind == NARY && (PRE_EXPR_NARY (expr)->opcode == PLUS_EXPR || PRE_EXPR_NARY (expr)->opcode == POINTER_PLUS_EXPR || PRE_EXPR_NARY (expr)->opcode == MINUS_EXPR) && TREE_CODE (PRE_EXPR_NARY (expr)->op[1]) == INTEGER_CST) but as said, a solution inside PRE might improve some cases but it won't catch the cases needing re-association. Running pcom before PRE would be best here but then we've been there before (and put pcom after vectorization from previously before it).