https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80819
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hubicka at gcc dot gnu.org, | |jakub at gcc dot gnu.org --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Honza, thoughts on this? Given: movq %rdi, -16(%rsp) movq -16(%rsp), %xmm0 pinsrq $1, %rsi, %xmm0 I'd say if pinsrq $1, %rsi, %xmm0 is not too slow on recent AMD, then either movq %rdi, %xmm0 should be also not too slow, or pinsrq $0, %rdi, %xmm0 should be the way to go. Note current trunk still emits a dead store with -mtune=intel -O2 -msse4: movq %rsi, -16(%rsp) movq %rdi, %xmm0 pinsrq $1, %rsi, %xmm0 and with -mtune=generic -O2 -msse4: movq %rdi, -16(%rsp) movq %rsi, -24(%rsp) movq -16(%rsp), %xmm0 pinsrq $1, %rsi, %xmm0 Wonder why doesn't DSE eliminate it.