http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54062
Bug #: 54062 Summary: extraneous move due to register allocation issue on x86_64 Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: mi...@it.uu.se Created attachment 27852 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27852 test case The attached test case defines two functions for inserting an element in a doubly-linked circular list. Ideally the two functions should compile to equivalent code, but on x86_64 the "good" function consistently gets an extra move to bridge the prologue to the final insertion block for the case where the list is empty (2nd insn after L6 below): --- bad 2012-07-21 12:29:46.000000000 +0200 +++ good 2012-07-21 12:29:45.000000000 +0200 @@ -1,4 +1,4 @@ -clocksource_enqueue_bad: +clocksource_enqueue_good: movq clocksource_list(%rip), %rax cmpq $clocksource_list, %rax je .L6 @@ -10,14 +10,15 @@ movq (%rax), %rax cmpq $clocksource_list, %rax jne .L5 - movq (%rdx), %rax + movq (%rdx), %rcx .L2: - leaq 8(%rdi), %rcx - movq %rcx, 8(%rax) - movq %rax, 8(%rdi) + leaq 8(%rdi), %rax + movq %rax, 8(%rcx) + movq %rcx, 8(%rdi) movq %rdx, 16(%rdi) - movq %rcx, (%rdx) + movq %rax, (%rdx) ret .L6: - movl $clocksource_list, %edx + movl $clocksource_list, %ecx + movq %rcx, %rdx jmp .L2 This occurs with gcc-4.8/4.7/4.6, haven't checked older versions. I'm entering this as a target bug because the extra move is not seen when compiling for sparc64, ppc64, or armv7. gcc-4.6 does a similar mistake when compiling for sparc64 and ppc64, but that goes away with gcc-4.7. For armv7 gcc-4.6 to 4.8 consistently generate equivalent code for both functions. The background is PR54031, a case where trunk briefly miscompiled some Linux kernel code that's technically invalid but has been working for ages. That code is the "bad" function here. The "good" function is a rewrite to avoid the technical problem (computing &ptr->field when ptr doesn't actually point to an object of its type), but that unfortunately causes a code size regression.