https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91154
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> --- Ah, because x86_64_general_operand allows memory but the v alternative not and reloading that is appearantly more expensive than not doing that and reloading the general reg later. Fun. Changing that to x86_64_nonmemory_operand makes the whole thing work nearly fully (for this testcase, breaking everything else of course), there's one gpr op remaining again because we get memory, this time in the first operand which I kept as nonimmediate_operand. Not sure how we make RA happier to reload a memory operand for the v,v,v alternative without doing that elsewhere. movl $-987654321, %r10d vmovd (%rdi), %xmm0 leal -1(%r8), %r9d xorl %eax, %eax vmovd %r10d, %xmm1 .p2align 4,,10 .p2align 3 .L3: vmovd (%rdx,%rax,4), %xmm2 vpaddd %xmm2, %xmm0, %xmm0 vmovd %xmm0, 4(%rdi,%rax,4) movl (%rcx,%rax,4), %r8d addl (%rsi,%rax,4), %r8d vmovd %r8d, %xmm3 movq %rax, %r8 vpmaxsd %xmm0, %xmm3, %xmm0 vpmaxsd %xmm1, %xmm0, %xmm0 vmovd %xmm0, 4(%rdi,%rax,4) addq $1, %rax cmpq %r9, %r8 jne .L3