http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50567
Bug #: 50567 Summary: Reload pass generates sub-optimal spill code for registers in presence of a vec_concat insn Classification: Unclassified Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: siddhesh.poyare...@gmail.com Reduced program: typedef long long __m128i __attribute__ ((__vector_size__ (16))); __m128i process(char *mem1, char *mem2) { long long frag1, frag2; frag2 = frag1 = *((long long *) mem1); if (mem2 > mem1) frag2 = *((long long *) mem2); return (__m128i){frag2, frag1}; } Generates redundant spills during the reload pass. IRA does not spill anything: process: .LFB0: .cfi_startproc movq (%rdi), %rax cmpq %rsi, %rdi movq %rax, %rdx jae .L2 movq (%rsi), %rdx .L2: movq %rdx, -16(%rsp) <== here onwards movq -16(%rsp), %xmm1 pinsrq $1, %rax, %xmm1 movdqa %xmm1, %xmm0 ret This seems to happen because the pinsrq instruction (the vec_concat implementation for x86_64) takes an SSE register for in and out and due to this, the reload pass generates the spill code to move %rdx to %xmm1 as well as the move from %xmm1 to %xmm0. Ideally, the code generated should look like this: process: .LFB0: .cfi_startproc movq (%rdi), %rax cmpq %rsi, %rdi movq %rax, %rdx jae .L2 movq (%rsi), %rdx .L2: movq %rdx, %xmm0 pinsrq $1, %rax, %xmm0 ret