https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
For a testcase trying to show the issue:

double U[1024];
double V[1024];

void foo (void)
{
  for (unsigned i = 1; i < 1023; ++i)
    V[i] = U[i-1] + U[i] + U[i+1];
}

we get from PRE (.optimized, w/ IVO disabled):



  <bb 2> [1.00%]:
  pretmp_19 = U[0];
  pretmp_21 = U[1];

  <bb 3> [99.00%]:
  # i_15 = PHI <_5(3), 1(2)>
  # prephitmp_20 = PHI <prephitmp_22(3), pretmp_19(2)>
  # prephitmp_22 = PHI <_6(3), pretmp_21(2)>
  # ivtmp_1 = PHI <ivtmp_18(3), 1022(2)>
  _5 = i_15 + 1;
  _6 = U[_5];
  _17 = _6 + prephitmp_22;
  _7 = _17 + prephitmp_20;
  V[i_15] = _7;
  ivtmp_18 = ivtmp_1 + 4294967295;
  if (ivtmp_18 != 0)
    goto <bb 3>; [98.99%]
  else
    goto <bb 4>; [1.01%]

  <bb 4> [1.00%]:
  return;

while predcom does the same transform but unrolls the loop:

  <bb 3> [50.00%]:
  # i_15 = PHI <1(2), _43(3)>
  # ivtmp_18 = PHI <1022(2), ivtmp_49(3)>
  # U_I_lsm0.3_30 = PHI <_33(2), _6(3)>
  # U_I_lsm1.4_31 = PHI <_34(2), _44(3)>
  # ivtmp_50 = PHI <1021(2), ivtmp_51(3)>
  _5 = i_15 + 1;
  _6 = U[_5];
  _37 = U_I_lsm0.3_30 + U_I_lsm1.4_31;
  _7 = _6 + _37;
  V[i_15] = _7;
  _43 = i_15 + 2;
  _44 = U[_43];
  _46 = U_I_lsm1.4_31 + _44;
  _47 = _6 + _46;
  V[_5] = _47;
  ivtmp_49 = ivtmp_18 + 4294967294;
  ivtmp_51 = ivtmp_50 + 4294967294;
  if (ivtmp_51 > 1)
    goto <bb 3>; [98.00%]
  else
    goto <bb 4>; [2.00%]

register pressure created by both are the same.

For the more complex testcase PRE misses the "combination chains" because
we exclude them as

Found partial redundancy for expression {plus_expr,_2,_3} (0012)
Skipping insertion of phi for partial redundancy: Looks like an induction
variable

when we mitigate that PRE can handle all cases where association is correct.
predictive commoning in addition to that does re-association of adds to
enable more chains (so the mitigation doesn't help for the original testcase).

Testcase that is helped this way:

double U[1024], W[1024];
double V[1024];

void foo (void)
{
  for (unsigned i = 1; i < 1023; ++i)
    V[i] = (U[i-1] + W[i-1]) + (U[i] + W[i]) + (U[i+1] + W[i+1]);
}

and PRE produces

  <bb 3> [99.00%]:
  # i_21 = PHI <_9(3), 1(2)>
  # prephitmp_43 = PHI <_23(3), _42(2)>
  # prephitmp_45 = PHI <prephitmp_43(3), _44(2)>
  # ivtmp_40 = PHI <ivtmp_38(3), 1022(2)>
  _9 = i_21 + 1;
  _10 = U[_9];
  _11 = W[_9];
  _23 = _10 + _11;
  _1 = _23 + prephitmp_43;
  _13 = _1 + prephitmp_45;
  V[i_21] = _13;
  ivtmp_38 = ivtmp_40 + 4294967295;
  if (ivtmp_38 != 0)
    goto <bb 3>; [98.99%]
  else
    goto <bb 4>; [1.01%]

note how we PRE U[] + W[] rather than U[] and W[].

Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c  (revision 245594)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -3008,7 +3008,9 @@ insert_into_preds_of_block (basic_block
                                                EDGE_PRED (block, 1)->src);
       /* Induction variables only have one edge inside the loop.  */
       if ((firstinsideloop ^ secondinsideloop)
-         && expr->kind != REFERENCE)
+         && expr->kind != REFERENCE
+         && (expr->kind != NARY
+             || INTEGRAL_TYPE_P (PRE_EXPR_NARY (expr)->type)))
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
            fprintf (dump_file, "Skipping insertion of phi for partial
redundancy: Looks like an induction variable\n");

for the original testcase in this bug this removes two IVs (but the IV
detection is lame).  What we mostly want to avoid here is creating
IVs for derived IVs, it might be enough to disregard

          && expr->kind == NARY
          && (PRE_EXPR_NARY (expr)->opcode == PLUS_EXPR
              || PRE_EXPR_NARY (expr)->opcode == POINTER_PLUS_EXPR
              || PRE_EXPR_NARY (expr)->opcode == MINUS_EXPR)
          && TREE_CODE (PRE_EXPR_NARY (expr)->op[1]) == INTEGER_CST)

but as said, a solution inside PRE might improve some cases but it won't
catch the cases needing re-association.  Running pcom before PRE would
be best here but then we've been there before (and put pcom after
vectorization from previously before it).

Reply via email to