https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987
--- Comment #7 from Haochen Jiang <haochen.jiang at intel dot com> --- Furthermore, when I build with GCC11, the codegen is much better: vaddps 0xc0(%rsp),%ymm5,%ymm2 vaddps 0xe0(%rsp),%ymm4,%ymm1 vmovaps %ymm2,0x80(%rsp) vmovdqa 0x90(%rsp),%xmm6 vmovaps %ymm1,0xa0(%rsp) vmovdqa 0xb0(%rsp),%xmm7 vmovdqa %xmm2,0xc0(%rsp) vmovdqa %xmm6,0xd0(%rsp) vmovdqa %xmm1,0xe0(%rsp) vmovdqa %xmm7,0xf0(%rsp) sub $0x1,%eax jne 401e00 <stress_vecfp_float_add_16.avx.1+0x1e0> Seems we might get two separate issues for this regression.