https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47754
Allan Jensen <linux at carewolf dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |linux at carewolf dot com
--- Comment #7 from Allan Jensen <linux at carewolf dot com> ---
This is significantly worse with integer operands.
_mm256_storeu_si256((__m256i *)&data[3],
_mm256_add_epi32(_mm256_loadu_si256((const __m256i *)&data[0]),
_mm256_loadu_si256((const __m256i *)&data[1]))
);
compiles to:
vmovdqu 0x20(%rax),%xmm0
vinserti128 $0x1,0x30(%rax),%ymm0,%ymm0
vmovdqu (%rax),%xmm1
vinserti128 $0x1,0x10(%rax),%ymm1,%ymm1
vpaddd %ymm1,%ymm0,%ymm0
vmovups %xmm0,0x60(%rax)
vextracti128 $0x1,%ymm0,0x70(%rax)