https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987

--- Comment #6 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> I tried to move "vmovdqa %xmm1,0xd0(%rsp)" before "vmovdqa %xmm0,0xe0(%rsp)"
> and rebuilt the binary and it will save half the regression.

 57.93 │200:   vaddps       0xc0(%rsp),%ymm3,%ymm5                        
 11.11 │       vaddps       0xe0(%rsp),%ymm2,%ymm6
        ...
  3.22 │       vmovdqa      %xmm1,0xc0(%rsp)                                    
       │       vmovdqa      %xmm5,0xd0(%rsp)                                    
  3.52 │       vmovdqa      %xmm0,0xe0(%rsp)                              
       │       vmovdqa      %xmm6,0xf0(%rsp)   

I guess there're specific patterns in SKX microarhitecture for STLF, the main
difference is instruction order of those xmm stores.

>From compiler side, the worth thing to do is PR107916.

Reply via email to