http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34723

--- Comment #3 from Jeffrey A. Law <law at redhat dot com> ---
Andrew, no.

4.2 didn't muck things up at all.  The 4.2 code is clearly better (unless
you're vectorizing the loop).

What's happening is the IV code changes the loop structure enough that
VRP2/DOM2 are unable to peel the iteration off the loop.  

In gcc-4.2, just prior to VRP2 we have:
  # BLOCK 2 freq:1000
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  # SUCC: 3 [100.0%]  (fallthru,exec)

 # BLOCK 3 freq:10000
  # PRED: 3 [90.0%]  (dfs_back,true,exec) 2 [100.0%]  (fallthru,exec)
  # ivtmp.43_3 = PHI <ivtmp.43_5(3), 0(2)>;
  # val_16 = PHI <val_11(3), 0(2)>;
<L0>:;
  D.1880_7 = MEM[symbol: table, index: ivtmp.43_3]{table[i]};
  D.1881_8 = (unsigned char) D.1880_7;
  val.1_9 = (unsigned char) val_16;
  D.1883_10 = val.1_9 + D.1881_8;
  val_11 = (char) D.1883_10;
  ivtmp.43_5 = ivtmp.43_3 + 1;
  if (ivtmp.43_5 != 10) goto <L0>; else goto <L2>;
  # SUCC: 3 [90.0%]  (dfs_back,true,exec) 4 [10.0%]  (loop_exit,false,exec)


VRP threads the jump through the backedge for the first iteration of the loop
resulting in:
 # BLOCK 2 freq:1000
  # PRED: ENTRY [100.0%]  (fallthru,exec)
  goto <bb 5> (<L8>);
  # SUCC: 5 [100.0%]  (fallthru,exec)

  # BLOCK 3 freq:9000
  # PRED: 5 [100.0%]  (fallthru) 3 [88.9%]  (true,exec)
  # ivtmp.43_3 = PHI <ivtmp.43_23(5), ivtmp.43_5(3)>;
  # val_16 = PHI <val_22(5), val_11(3)>;
<L0>:;
  D.1880_7 = MEM[symbol: table, index: ivtmp.43_3]{table[i]};
  D.1881_8 = (unsigned char) D.1880_7;
  val.1_9 = (unsigned char) val_16;
  D.1883_10 = val.1_9 + D.1881_8;
  val_11 = (char) D.1883_10;
  ivtmp.43_5 = ivtmp.43_3 + 1;
  if (ivtmp.43_5 != 10) goto <L0>; else goto <L2>;
  # SUCC: 3 [88.9%]  (true,exec) 4 [11.1%]  (loop_exit,false,exec)

  # BLOCK 4 freq:1000
  # PRED: 3 [11.1%]  (loop_exit,false,exec)
  # val_2 = PHI <val_11(3)>;
<L2>:;
  D.1884_13 = (int) val_2;
  return D.1884_13;
  # SUCC: EXIT [100.0%]

  # BLOCK 5 freq:1000
  # PRED: 2 [100.0%]  (fallthru,exec)
  # ivtmp.43_17 = PHI <0(2)>;
  # val_1 = PHI <0(2)>;
<L8>:;
  D.1880_18 = MEM[symbol: table, index: ivtmp.43_17]{table[i]};
  D.1881_19 = (unsigned char) D.1880_18;
  val.1_20 = (unsigned char) val_1;
  D.1883_21 = val.1_20 + D.1881_19;
  val_22 = (char) D.1883_21;
  ivtmp.43_23 = ivtmp.43_17 + 1;
  goto <bb 3> (<L0>);
  # SUCC: 3 [100.0%]  (fallthru)

Which will ultimately compile down to the efficient code where the first
iteration has been peeled off.

If we look at the trunk, DOM2/VRP2 have had the order changed, so if we look at
the code immediately prior to DOM2 we have:

 <bb 2>:
  ivtmp.10_16 = (unsigned long) &table;
  _12 = (unsigned long) &MEM[(void *)&table + 10B];
  goto <bb 4>;

  <bb 3>:

  <bb 4>:
  # val_14 = PHI <val_8(3), 0(2)>
  # ivtmp.10_18 = PHI <ivtmp.10_17(3), ivtmp.10_16(2)>
  _13 = (void *) ivtmp.10_18;
  _4 = MEM[base: _13, offset: 0B];
  _5 = (unsigned char) _4;
  val.0_6 = (unsigned char) val_14;
  _7 = _5 + val.0_6;
  val_8 = (char) _7;
  ivtmp.10_17 = ivtmp.10_18 + 1;
  if (ivtmp.10_17 != _12)
    goto <bb 3>;
  else
    goto <bb 5>;

Note how the test to go back to the top of the loop has changed.  It's no
longer testing a simple integer counter, which threading handled nicely. 
Instead it's a more complex test involving two objects.  And neither DOM2 nor
VRP2 are able to untangle it to get the code we want.

ISTM this  should have a regression marker and attached to the jump threading
meta-bug.  



_

Reply via email to