http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48037
Marc Glisse <glisse at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |glisse at gcc dot gnu.org --- Comment #9 from Marc Glisse <glisse at gcc dot gnu.org> 2012-10-23 18:42:28 UTC --- (In reply to comment #8) > The useless reg-reg moves prevail: > > _Z6vsqrt2U8__vectord: > .LFB520: > .cfi_startproc > sqrtsd %xmm0, %xmm1 > unpckhpd %xmm0, %xmm0 > movapd %xmm1, %xmm2 > sqrtsd %xmm0, %xmm0 > unpcklpd %xmm0, %xmm2 > movapd %xmm2, %xmm0 > ret > > both movapds can be avoided by better register allocation. With LRA we get one mov less: _Z6vsqrt2U8__vectord: .LFB521: .cfi_startproc sqrtsd %xmm0, %xmm2 unpckhpd %xmm0, %xmm0 sqrtsd %xmm0, %xmm1 movapd %xmm2, %xmm0 unpcklpd %xmm1, %xmm0 ret