-O1 -fpeel-loops

rguenth at gcc dot gnu.org Fri, 20 Apr 2018 00:17:30 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85475


--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
In dom3 I see

Applying pattern match.pd:2585, gimple-match.c:10461
Applying pattern match.pd:2585, gimple-match.c:10461
Applying pattern match.pd:2585, gimple-match.c:10461
Applying pattern match.pd:2585, gimple-match.c:10461
Applying pattern match.pd:2585, gimple-match.c:10461
Applying pattern match.pd:2585, gimple-match.c:10461
Applying pattern match.pd:2585, gimple-match.c:10461
Applying pattern match.pd:2585, gimple-match.c:10461
Applying pattern match.pd:2585, gimple-match.c:10461
Applying pattern match.pd:2585, gimple-match.c:10461
...

it is for example matching on  (le_10 * 2) * (le_10 * 2), transforming
it to (le_10 * le_10) * 4 which fails because of the :s.  But then DOM
faces the situation where the loop is fully unrolled and we have

nj (int le)
{
  int zb;
  unsigned int ivtmp_1;
  int _6;
  unsigned int ivtmp_16;
  unsigned int ivtmp_100;

  <bb 2> [local count: 63136020]:
  le_14 = le_3(D) * 2;
  zb_15 = 1;
  ivtmp_16 = 15;
  le_20 = le_14 * 2;
  zb_21 = 2;
  le_26 = le_20 * 2;
  zb_27 = 3;
  le_32 = le_26 * 2;
  zb_33 = 4;
  le_38 = le_32 * 2;
  zb_39 = 5;
  le_44 = le_38 * 2;
  zb_45 = 6;
  le_50 = le_44 * 2;
  zb_51 = 7;
  le_56 = le_50 * 2;
  zb_57 = 8;
  le_62 = le_56 * 2;
  zb_63 = 9;
  le_68 = le_62 * 2;
  zb_69 = 10;
  le_74 = le_68 * 2;
  zb_75 = 11;
  le_80 = le_74 * 2;
  zb_81 = 12;
  le_86 = le_80 * 2;
  zb_87 = 13;
  le_92 = le_86 * 2;
  zb_93 = 14;
  le_98 = le_92 * 2;
  zb_99 = 15;
  ivtmp_100 = 1;
  le_4 = le_98 * 2;
  zb_5 = 16;
  ivtmp_1 = 0;
  _6 = le_4 * le_4;
  return _6;

}

with DOM folding le_4 * le_4 via gimple_fold_stmt_to_constant_1.  You can
trivially see the exponential work we do when applying the pattern if you
consider that we re-simplify each simplification result.

/* Reassociate (X * CST) * Y to (X * Y) * CST.  This does not introduce
   signed overflow for CST != 0 && CST != -1.  */
(simplify
 (mult:c (mult:s @0 INTEGER_CST@1) @2)
 (if (TREE_CODE (@2) != INTEGER_CST
      && !integer_zerop (@1) && !integer_minus_onep (@1))
  (mult (mult @0 @2) @1)))

The issue is that we get into non-linear complexity once there's more
than a single use of the inner multiplication, so an explicit single_use
fixes that.

[Bug tree-optimization/85475] [8 Regression] Compile time hog w/ -O1 -fpeel-loops

Reply via email to