https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69983

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
comment #2 (In reply to Richard Biener from comment #2)
> So we fail analyzing the loop nest because a) the loop header check of the
> inner
> loop makes it conditionally executed, b) we have outer loop IV computes i+-1
> guarded by that check.
> 
> For a/b) it would be best if the outer loop were tail duplicated.
> 
> Looking at the other reported fallout now.

That's unrelated.

So the CFG is the following.

  <bb 7>:

  <bb 8>:
  # i_34 = PHI <1(3), prephitmp_55(7)>
  if (_49 > 1)
    goto <bb 10>;
  else
    goto <bb 9>;

  <bb 10>:
  _52 = i_34 + -1;
  _53 = i_34 + 1;
  goto <bb 5>;

  <bb 4>:

  <bb 5>:
  # j_35 = PHI <1(10), _14(4)>
  _10 = P[i_34][j_35];
  _11 = j_35 + -1;
  _12 = P[i_34][_11];
  _13 = _10 + _12;
  _14 = j_35 + 1;
  _15 = P[i_34][_14];
  _16 = _13 + _15;
  _18 = P[_52][j_35];
  _19 = _16 + _18;
  _21 = P[_53][j_35];
  _22 = _19 + _21;
  _23 = _22 / 5.0e+0;
  P[i_34][j_35] = _23;
  if (_14 < _49)
    goto <bb 4>;
  else
    goto <bb 6>;

  <bb 9>:
  _54 = i_34 + 1;
  goto <bb 6>;

  <bb 6>:
  # prephitmp_55 = PHI <_53(5), _54(9)>
  if (_30 > prephitmp_55)
    goto <bb 7>;
  else
    goto <bb 11>;

  <bb 7>:

  <bb 11>:
  return;

and hoisting the loop header check out of the outer loop fixes the PR
(thus for example run it at -O3).  loop unswitching performs this
optimization.  Not sure if we can improve things otherwise - the IV
computations of i + 1 / i - 1 are inside the inner loop and thus guarded
by the check.  The only thing we can do is try to improve the overflow
detection in SCEV / niter analysis to conclude the now unsigned IV { 1, +, 1
}_1
does not overflow because of the loop header check:

_30 = N1_6(D) + -1;
if (_30 > 1)
  goto <bb 3>;
else
  goto <bb 11>;

that would be done via chrec_convert / convert_affine_scev here.  Sadly
the outer loop has no control-IVs recorded for example because of the
redundant _53 / _54 IV update fed into the PHI in bb 6.  Looks like that
mess is caused by PRE first hoisting the loop invariant i + 1 / i - 1
and then PREing it (but not hoisting because it doesn't implement that...).

Reply via email to