https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86270
Bug ID: 86270 Summary: Simple loop needs an extra register and an extra instruction Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org Target Milestone: --- Compiling the following simple example with GCC 8 on an x86_64 with just -O2 -S: ---------------------------------------- int *a; long len; int test () { for (int i = 0; i < len + 1; i++) a[i]=i; } ---------------------------------------- Results in a loop comparing the value before increment with the upper bound after actually doing the incrementation, which means the loop needs an extra register, an extra instruction, and has a rather convoluted structure (alignment directives and some labels omitted): ---------------------------------------- test: .cfi_startproc movq len(%rip), %rcx testq %rcx, %rcx js .L2 movq a(%rip), %rsi xorl %eax, %eax jmp .L3 .L4: movq %rdx, %rax .L3: movl %eax, (%rsi,%rax,4) leaq 1(%rax), %rdx cmpq %rax, %rcx jne .L4 .L2: ret ---------------------------------------- as opposed to GCC 7 or when compiling with -fno-tree-fwprop: ---------------------------------------- test: .cfi_startproc movq len(%rip), %rdx testq %rdx, %rdx js .L2 movq a(%rip), %rcx addq $1, %rdx xorl %eax, %eax .L3: movl %eax, (%rcx,%rax,4) addq $1, %rax cmpq %rdx, %rax jne .L3 .L2: ret ---------------------------------------- This problem (specifically the need for an extra register) causes that, on an AMD Ryzen machine, 465.tonto is almost 5% faster when compiled with -fno-tree-fwprop.