https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987
--- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> --- > I tried to move "vmovdqa %xmm1,0xd0(%rsp)" before "vmovdqa %xmm0,0xe0(%rsp)" > and rebuilt the binary and it will save half the regression. 57.93 │200: vaddps 0xc0(%rsp),%ymm3,%ymm5 11.11 │ vaddps 0xe0(%rsp),%ymm2,%ymm6 ... 3.22 │ vmovdqa %xmm1,0xc0(%rsp) │ vmovdqa %xmm5,0xd0(%rsp) 3.52 │ vmovdqa %xmm0,0xe0(%rsp) │ vmovdqa %xmm6,0xf0(%rsp) I guess there're specific patterns in SKX microarhitecture for STLF, the main difference is instruction order of those xmm stores. >From compiler side, the worth thing to do is PR107916.