https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104582

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Just trying a dumb microbenchmark:
struct S { unsigned long a, b; } s;

__attribute__((noipa)) void
foo (unsigned long a, unsigned long b)
{
  s.a = a;
  s.b = b;
}

int
main ()
{
  int i;
  for (i = 0; i < 1000000000; i++)
    foo (42, 43);
  return 0;
}
the GCC 11 vs. GCC 12 code:
-       movq    %rdi, s(%rip)
-       movq    %rsi, s+8(%rip)
+       movq    %rdi, %xmm0
+       movq    %rsi, %xmm1
+       punpcklqdq      %xmm1, %xmm0
+       movaps  %xmm0, s(%rip)
seems to be exactly the same speed (on i9-7960X) and the GCC 11 code is 7 bytes
smaller.

Reply via email to