https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103611
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Hmm, GCC 4.8.1-5.5.0 produces: long long SSE2ExtractInt64<0>(long long __vector): .LFB499: .cfi_startproc pshufd xmm1, xmm0, 1 movd eax, xmm0 movd edx, xmm1 ret long long SSE2ExtractInt64<1>(long long __vector): .LFB500: .cfi_startproc pshufd xmm1, xmm0, 3 pshufd xmm0, xmm0, 2 movd edx, xmm1 movd eax, xmm0 ret For the code in comment #0. And always used memory for code in comment #2.