https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Just trying a dumb microbenchmark: struct S { unsigned long a, b; } s; __attribute__((noipa)) void foo (unsigned long a, unsigned long b) { s.a = a; s.b = b; } int main () { int i; for (i = 0; i < 1000000000; i++) foo (42, 43); return 0; } the GCC 11 vs. GCC 12 code: - movq %rdi, s(%rip) - movq %rsi, s+8(%rip) + movq %rdi, %xmm0 + movq %rsi, %xmm1 + punpcklqdq %xmm1, %xmm0 + movaps %xmm0, s(%rip) seems to be exactly the same speed (on i9-7960X) and the GCC 11 code is 7 bytes smaller.