https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71437
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW CC| |amker at gcc dot gnu.org Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- With -fwhole-program there's no regression from GCC 6.2 to current trunk. Without I still can see a small regression (here 0.86s vs 0.92s). >From looking at the assembly it's hard to tell what the issue is. perf shows hot spots at mispredicted branches it seems (for both good and bad case). In .optimized I see that IVO with different choices for trunk with the input into IVO being more or less the same. Trunk ends up with <bb 6> [92.50%]: # i_138 = PHI <i_128(10), 0(5)> # ivtmp.78_378 = PHI <ivtmp.78_377(10), ivtmp.78_376(5)> _5 = (const int *) ivtmp.78_378; _366 = (void *) ivtmp.78_378; _6 = MEM[base: _366, offset: 0B]; if (_6 > L.2_7) goto <bb 7>; [50.00%] else goto <bb 9>; [50.00%] <bb 7> [46.25%]: _370 = (unsigned int) i_138; _369 = _370 * 4; _10 = _369; _368 = ivtmp.78_378 + 4294967292; _367 = (const int *) _368; _11 = _367; _374 = (void *) ivtmp.78_378; _12 = MEM[base: _374, offset: 4294967292B]; if (L.2_7 >= _12) goto <bb 8>; [7.50%] else goto <bb 9>; [92.50%] <bb 9> [89.03%]: i_128 = i_138 + 1; ivtmp.78_377 = ivtmp.78_378 + 4; if (i_128 != _371) goto <bb 10>; [92.50%] else goto <bb 11>; [7.50%] <bb 10> [82.35%]: goto <bb 6>; [100.00%] while GCC 6 did <bb 8>: # i_153 = PHI <0(7), i_19(12)> _572 = (sizetype) i_153; _17 = MEM[base: pretmp_509, index: _572, step: 4, offset: 4B]; if (_17 > pretmp_506) goto <bb 9>; else goto <bb 11>; <bb 9>: _591 = (sizetype) i_153; _22 = MEM[base: pretmp_509, index: _591, step: 4, offset: 0B]; if (_22 <= pretmp_506) goto <bb 10>; else goto <bb 11>; <bb 11>: i_19 = i_153 + 1; if (i_19 != _573) goto <bb 12>; else goto <bb 13>; <bb 12>: goto <bb 8>; but not sure if that ends up slower. GCC 6.2 asm: .L23: movl %edx, %eax .L27: movl 4(%esi,%eax,4), %ecx cmpl %ebx, %ecx jle .L11 movl (%esi,%eax,4), %ebp cmpl %ebx, %ebp jle .L34 .L11: leal 1(%eax), %edx cmpl %edi, %edx jne .L23 GCC 7: .L23: movl %edx, %ecx .L13: cmpl %esi, (%eax) movl %eax, %ebx jle .L11 cmpl -4(%eax), %esi leal 0(,%ecx,4), %edx leal -4(%eax), %ebp jge .L30 .L11: leal 1(%ecx), %edx addl $4, %eax cmpl %edi, %edx jne .L23 at least this is the most notable difference in the innermost loops on GIMPLE (plenty of differences in the outer loop stuff). Bin, any idea why IVO does the "bad" choice here?