https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91117
--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> --- The generated code is created the way it is partially by design, and partially by missing (!y,*x) alternative in *vec_extractv2di_0_sse. GCC lowers intrinsics to generic operations, and this has its pluses (generic optimizations can be used) and minuses. For the later, generic operations should avoid to allocate MMX registers unless absolutely necessary, and the compiler discourages their use by penalizing operations (moves) involving MMX registers. gcc-10 emulates MMX operations using SSE instructions, so your code gets compiled to: pmullw %xmm0, %xmm0 movq %xmm0, %xmm0 ret