https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103227

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
There is difference in inlier decision.  Since all clones are of same size it
depends on the order inliner picks them and combines together before hitting
large-function-growth.  It seems that with isra ordering inliner simply less
lucky.

Instead of inline stack:
IPA function summary for digits_2.constprop/143 inlinable
  global time:     22960.500916
  self size:       1277
  global size:     2534
  min size:       513
  self stack:      261
  global stack:    783
  estimated growth:-488
    size:513.000000, time:6690.410500
    size:3.000000, time:2.000001,  executed if:(not inlined)
    size:0.500000, time:0.500000,  executed if:(not inlined),  nonconst
if:(op0[ref offset: 0] changed) && (not inlined)
    size:138.500000, time:217.532556,  nonconst if:(op0[ref offset: 0] changed)
    size:36.000000, time:34.793911,  executed if:(op0[ref offset: 0],(# % 3) ==
2),  nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
2)
    size:198.000000, time:574.099545,  executed if:(op0[ref offset: 0],(# % 3)
== 2)
    size:36.000000, time:34.793911,  executed if:(op0[ref offset: 0],(# % 3) ==
1),  nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
1)
    size:270.000000, time:1357.103458,  executed if:(op0[ref offset: 0],(# % 3)
== 1)
    size:21.000000, time:375.971570,  executed if:(op0[ref offset: 0] == 5)
    size:1263.000000, time:12359.502960,  executed if:(op0[ref offset: 0] != 8)
    size:1.000000, time:0.900000,  executed if:(op0[ref offset: 0] != 8), 
nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0] != 8)
    size:48.000000, time:1300.920311,  executed if:(op0[ref offset: 0] == 8)
  loop iterations:  0.68 for (op0[ref offset: 0] changed)
  0.76 for (op0[ref offset: 0] changed)
  0.88 for (op0[ref offset: 0] changed)
  1.08 for (op0[ref offset: 0] changed)
  1.40 for (op0[ref offset: 0] changed)
  1.93 for (op0[ref offset: 0] changed)
  2.80 for (op0[ref offset: 0] changed)
  4.23 for (op0[ref offset: 0] changed)
  11.88 for (op0[ref offset: 0] changed)
  4.59 for (op0[ref offset: 0] changed)
  3.16 for (op0[ref offset: 0] changed)
  2.29 for (op0[ref offset: 0] changed)
  1.76 for (op0[ref offset: 0] changed)
  1.44 for (op0[ref offset: 0] changed)
  1.24 for (op0[ref offset: 0] changed)
  1.12 for (op0[ref offset: 0] changed)
  calls:
    covered.constprop/148 --param max-inline-insns-auto limit reached
      freq:0.30 loop depth: 9 size: 4 time: 13 callee size:262 stack:1472
predicate: (op0[ref offset: 0] == 8)
       op0 is compile time invariant
       op0 points to local or readonly memory
       op1 is compile time invariant
       op1 points to local or readonly memory
    digits_2.constprop/144 inlined
      freq:0.90
      Stack frame offset 261, callee self size 261
      __builtin_unreachable/156 unreachable
        freq:0.00 cross module loop depth:18 size: 0 time:  0 predicate:
(false)
         op0 is compile time invariant
         op0 points to local or readonly memory
         op1 is compile time invariant
         op1 points to local or readonly memory
      digits_2.constprop/145 inlined
        freq:0.81
        Stack frame offset 522, callee self size 261
        __builtin_unreachable/156 unreachable
          freq:0.00 cross module loop depth:27 size: 0 time:  0 predicate:
(false)
           op0 points to local or readonly memory
           op1 is compile time invariant
           op1 points to local or readonly memory
        digits_2.constprop/146 --param large-function-growth limit reached
          freq:0.73 loop depth:27 size: 2 time: 11 callee size:1019 stack:522
predicate: (op0[ref offset: 0] != 8)
           op0 is compile time invariant
           op0 points to local or readonly memory

where inlining fails only at recursion depth 4 we get:

IPA function summary for digits_2.constprop.isra/163 inlinable
  global time:     17184.704285
  self size:       1277
  global size:     1994
  min size:       513
  self stack:      261
  global stack:    522
  estimated growth:301
    size:513.000000, time:6690.410500
    size:3.000000, time:2.000001,  executed if:(not inlined)
    size:0.500000, time:0.500000,  executed if:(not inlined),  nonconst
if:(op0[ref offset: 0] changed) && (not inlined)
    size:138.500000, time:217.532556,  nonconst if:(op0[ref offset: 0] changed)
    size:36.000000, time:34.793911,  executed if:(op0[ref offset: 0],(# % 3) ==
2),  nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
2)
    size:198.000000, time:574.099545,  executed if:(op0[ref offset: 0],(# % 3)
== 2)
    size:36.000000, time:34.793911,  executed if:(op0[ref offset: 0],(# % 3) ==
1),  nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
1)
    size:270.000000, time:1357.103458,  executed if:(op0[ref offset: 0],(# % 3)
== 1)
    size:21.000000, time:375.971570,  executed if:(op0[ref offset: 0] == 5)
    size:723.000000, time:6582.815331,  executed if:(op0[ref offset: 0] != 8)
    size:1.000000, time:0.900000,  executed if:(op0[ref offset: 0] != 8), 
nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0] != 8)
    size:48.000000, time:1300.920311,  executed if:(op0[ref offset: 0] == 8)
  loop iterations:  0.68 for (op0[ref offset: 0] changed)
  0.76 for (op0[ref offset: 0] changed)
  0.88 for (op0[ref offset: 0] changed)
  1.08 for (op0[ref offset: 0] changed)
  1.40 for (op0[ref offset: 0] changed)
  1.93 for (op0[ref offset: 0] changed)
  2.80 for (op0[ref offset: 0] changed)
  4.23 for (op0[ref offset: 0] changed)
  11.88 for (op0[ref offset: 0] changed)
  4.59 for (op0[ref offset: 0] changed)
  3.16 for (op0[ref offset: 0] changed)
  2.29 for (op0[ref offset: 0] changed)
  1.76 for (op0[ref offset: 0] changed)
  1.44 for (op0[ref offset: 0] changed)
  1.24 for (op0[ref offset: 0] changed)
  1.12 for (op0[ref offset: 0] changed)
  calls:
    digits_2.constprop.isra/162 inlined
      freq:0.90
      Stack frame offset 261, callee self size 261
      digits_2.constprop.isra/161 --param large-function-growth limit reached
        freq:0.81 loop depth:18 size: 2 time: 11 callee size:1033 stack:522
predicate: (op0[ref offset: 0] != 8)
         op0 is compile time invariant
         op0 points to local or readonly memory
      __builtin_unreachable/168 unreachable
        freq:0.00 cross module loop depth:18 size: 0 time:  0 predicate:
(false)
         op0 is compile time invariant
         op0 points to local or readonly memory
         op1 is compile time invariant
         op1 points to local or readonly memory
    covered.constprop/148 --param max-inline-insns-auto limit reached
      freq:0.30 loop depth: 9 size: 4 time: 13 callee size:262 stack:1472
predicate: (op0[ref offset: 0] == 8)
       op0 is compile time invariant
       op0 points to local or readonly memory
       op1 is compile time invariant
       op1 points to local or readonly memory

where we fail at depth2

Reply via email to