https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103227
--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
There is difference in inlier decision. Since all clones are of same size it
depends on the order inliner picks them and combines together before hitting
large-function-growth. It seems that with isra ordering inliner simply less
lucky.
Instead of inline stack:
IPA function summary for digits_2.constprop/143 inlinable
global time: 22960.500916
self size: 1277
global size: 2534
min size: 513
self stack: 261
global stack: 783
estimated growth:-488
size:513.000000, time:6690.410500
size:3.000000, time:2.000001, executed if:(not inlined)
size:0.500000, time:0.500000, executed if:(not inlined), nonconst
if:(op0[ref offset: 0] changed) && (not inlined)
size:138.500000, time:217.532556, nonconst if:(op0[ref offset: 0] changed)
size:36.000000, time:34.793911, executed if:(op0[ref offset: 0],(# % 3) ==
2), nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
2)
size:198.000000, time:574.099545, executed if:(op0[ref offset: 0],(# % 3)
== 2)
size:36.000000, time:34.793911, executed if:(op0[ref offset: 0],(# % 3) ==
1), nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
1)
size:270.000000, time:1357.103458, executed if:(op0[ref offset: 0],(# % 3)
== 1)
size:21.000000, time:375.971570, executed if:(op0[ref offset: 0] == 5)
size:1263.000000, time:12359.502960, executed if:(op0[ref offset: 0] != 8)
size:1.000000, time:0.900000, executed if:(op0[ref offset: 0] != 8),
nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0] != 8)
size:48.000000, time:1300.920311, executed if:(op0[ref offset: 0] == 8)
loop iterations: 0.68 for (op0[ref offset: 0] changed)
0.76 for (op0[ref offset: 0] changed)
0.88 for (op0[ref offset: 0] changed)
1.08 for (op0[ref offset: 0] changed)
1.40 for (op0[ref offset: 0] changed)
1.93 for (op0[ref offset: 0] changed)
2.80 for (op0[ref offset: 0] changed)
4.23 for (op0[ref offset: 0] changed)
11.88 for (op0[ref offset: 0] changed)
4.59 for (op0[ref offset: 0] changed)
3.16 for (op0[ref offset: 0] changed)
2.29 for (op0[ref offset: 0] changed)
1.76 for (op0[ref offset: 0] changed)
1.44 for (op0[ref offset: 0] changed)
1.24 for (op0[ref offset: 0] changed)
1.12 for (op0[ref offset: 0] changed)
calls:
covered.constprop/148 --param max-inline-insns-auto limit reached
freq:0.30 loop depth: 9 size: 4 time: 13 callee size:262 stack:1472
predicate: (op0[ref offset: 0] == 8)
op0 is compile time invariant
op0 points to local or readonly memory
op1 is compile time invariant
op1 points to local or readonly memory
digits_2.constprop/144 inlined
freq:0.90
Stack frame offset 261, callee self size 261
__builtin_unreachable/156 unreachable
freq:0.00 cross module loop depth:18 size: 0 time: 0 predicate:
(false)
op0 is compile time invariant
op0 points to local or readonly memory
op1 is compile time invariant
op1 points to local or readonly memory
digits_2.constprop/145 inlined
freq:0.81
Stack frame offset 522, callee self size 261
__builtin_unreachable/156 unreachable
freq:0.00 cross module loop depth:27 size: 0 time: 0 predicate:
(false)
op0 points to local or readonly memory
op1 is compile time invariant
op1 points to local or readonly memory
digits_2.constprop/146 --param large-function-growth limit reached
freq:0.73 loop depth:27 size: 2 time: 11 callee size:1019 stack:522
predicate: (op0[ref offset: 0] != 8)
op0 is compile time invariant
op0 points to local or readonly memory
where inlining fails only at recursion depth 4 we get:
IPA function summary for digits_2.constprop.isra/163 inlinable
global time: 17184.704285
self size: 1277
global size: 1994
min size: 513
self stack: 261
global stack: 522
estimated growth:301
size:513.000000, time:6690.410500
size:3.000000, time:2.000001, executed if:(not inlined)
size:0.500000, time:0.500000, executed if:(not inlined), nonconst
if:(op0[ref offset: 0] changed) && (not inlined)
size:138.500000, time:217.532556, nonconst if:(op0[ref offset: 0] changed)
size:36.000000, time:34.793911, executed if:(op0[ref offset: 0],(# % 3) ==
2), nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
2)
size:198.000000, time:574.099545, executed if:(op0[ref offset: 0],(# % 3)
== 2)
size:36.000000, time:34.793911, executed if:(op0[ref offset: 0],(# % 3) ==
1), nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0],(# % 3) ==
1)
size:270.000000, time:1357.103458, executed if:(op0[ref offset: 0],(# % 3)
== 1)
size:21.000000, time:375.971570, executed if:(op0[ref offset: 0] == 5)
size:723.000000, time:6582.815331, executed if:(op0[ref offset: 0] != 8)
size:1.000000, time:0.900000, executed if:(op0[ref offset: 0] != 8),
nonconst if:(op0[ref offset: 0] changed) && (op0[ref offset: 0] != 8)
size:48.000000, time:1300.920311, executed if:(op0[ref offset: 0] == 8)
loop iterations: 0.68 for (op0[ref offset: 0] changed)
0.76 for (op0[ref offset: 0] changed)
0.88 for (op0[ref offset: 0] changed)
1.08 for (op0[ref offset: 0] changed)
1.40 for (op0[ref offset: 0] changed)
1.93 for (op0[ref offset: 0] changed)
2.80 for (op0[ref offset: 0] changed)
4.23 for (op0[ref offset: 0] changed)
11.88 for (op0[ref offset: 0] changed)
4.59 for (op0[ref offset: 0] changed)
3.16 for (op0[ref offset: 0] changed)
2.29 for (op0[ref offset: 0] changed)
1.76 for (op0[ref offset: 0] changed)
1.44 for (op0[ref offset: 0] changed)
1.24 for (op0[ref offset: 0] changed)
1.12 for (op0[ref offset: 0] changed)
calls:
digits_2.constprop.isra/162 inlined
freq:0.90
Stack frame offset 261, callee self size 261
digits_2.constprop.isra/161 --param large-function-growth limit reached
freq:0.81 loop depth:18 size: 2 time: 11 callee size:1033 stack:522
predicate: (op0[ref offset: 0] != 8)
op0 is compile time invariant
op0 points to local or readonly memory
__builtin_unreachable/168 unreachable
freq:0.00 cross module loop depth:18 size: 0 time: 0 predicate:
(false)
op0 is compile time invariant
op0 points to local or readonly memory
op1 is compile time invariant
op1 points to local or readonly memory
covered.constprop/148 --param max-inline-insns-auto limit reached
freq:0.30 loop depth: 9 size: 4 time: 13 callee size:262 stack:1472
predicate: (op0[ref offset: 0] == 8)
op0 is compile time invariant
op0 points to local or readonly memory
op1 is compile time invariant
op1 points to local or readonly memory
where we fail at depth2