http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49513

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2011.06.23 11:55:09
                 CC|                            |rguenth at gcc dot gnu.org
     Ever Confirmed|0                           |1
            Summary|introducing a product       |PRE inhibits if-conversion
                   |inhibit vectorization       |and vectorization

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-06-23 
11:55:09 UTC ---
That is because PRE decided to optimize the first iteration where it knows
that z == 0 and thus z*s[i] and z*c[i] are 0 and thus a[i] will be 0.  Which
means we confuse if-conversion which in turn causes this missed vectorization.
IL before if-conversion:

<bb 2>:
  goto <bb 6>;

<bb 3>:
  z_3 = (float) i_9;
  D.3280_4 = c[i_9];
  D.3281_5 = D.3280_4 * z_3;
  D.3282_6 = s[i_9];
  D.3283_7 = D.3282_6 * z_3;
  yy_13 = ABS_EXPR <D.3281_5>;
  yy_14 = ABS_EXPR <D.3283_7>;
  if (yy_13 < yy_14)
    goto <bb 5>;
  else
    goto <bb 4>;

<bb 4>:

<bb 5>:
  # yy_28 = PHI <yy_13(4), yy_14(3)>
  # yy_29 = PHI <yy_14(4), yy_13(3)>

<bb 6>:
  # yy_16 = PHI <yy_28(5), 0.0(2)>
  # yy_15 = PHI <yy_29(5), 0.0(2)>
  # i_30 = PHI <i_9(5), 0(2)>
  # ivtmp.44_24 = PHI <ivtmp.44_10(5), 1024(2)>
  t_17 = yy_15 / yy_16;
  a[i_30] = t_17;
  i_9 = i_30 + 1;
  ivtmp.44_10 = ivtmp.44_24 - 1;
  if (ivtmp.44_10 != 0)
    goto <bb 3>;
  else
    goto <bb 7>;

<bb 7>:
  return;

The PHI nodes in bb 6 is what makes if-conversion fail (thus, the
irregular loop entry which really should be peeled off).

This situation commonly occurs when PRE can compute the first iterations
result.  Manually peeling off the iteration like the following is a
workaround:

void foo2() {
  a[0] = 0;
  for (int i=1; i!=1024; ++i) {
    float z = i;
    a[i] = bar(z*s[i],z*c[i]);
 }
}

Not sure if your original issue you derived this testcase from really
matches the above problem though.

Reply via email to