https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007
Bug ID: 87007 Summary: [8/9 Regression] 10% slowdown with -march=skylake-avx512 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hjl.tools at gmail dot com CC: skpgkp1 at gmail dot com Target Milestone: --- Target: i386,x86-64 On Intel Skylake server, r262649 caused 10% slowdown for 538.imagick_r in SPEC CPU 2017 when compiled with: gcc -Ofast -march=skylake-avx512 -mfpmath=sse -fno-associative-math -funroll-loops -flto For [hjl@gnu-cfl-1 skx-2]$ cat foo.i extern float f; extern double d; extern int i; void foo (void) { d = f; f = i; } r262649 turned on sse_partial_reg_dependency, which generates vxorpd %xmm0, %xmm0, %xmm0 vcvtss2sd f(%rip), %xmm0, %xmm0 vmovsd %xmm0, d(%rip) vxorps %xmm0, %xmm0, %xmm0 vcvtsi2ss i(%rip), %xmm0, %xmm0 vmovss %xmm0, f(%rip) ret instead of vcvtss2sd f(%rip), %xmm0, %xmm0 vmovsd %xmm0, d(%rip) vcvtsi2ss i(%rip), %xmm0, %xmm0 vmovss %xmm0, f(%rip) ret One "vxorpd %xmm0, %xmm0, %xmm0" is necessary. But both vxorps %xmm0, %xmm0, %xmm0 and vxorps %xmm0, %xmm0, %xmm0 are bad for performance.