https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007

            Bug ID: 87007
           Summary: [8/9 Regression] 10% slowdown with
                    -march=skylake-avx512
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hjl.tools at gmail dot com
                CC: skpgkp1 at gmail dot com
  Target Milestone: ---
            Target: i386,x86-64

On Intel Skylake server, r262649 caused 10% slowdown for 538.imagick_r
in SPEC CPU 2017 when compiled with:

gcc -Ofast -march=skylake-avx512 -mfpmath=sse -fno-associative-math
-funroll-loops -flto

For

[hjl@gnu-cfl-1 skx-2]$ cat foo.i
extern float f;
extern double d;
extern int i;

void
foo (void)
{
  d = f;
  f = i;
}

r262649 turned on sse_partial_reg_dependency, which generates

        vxorpd  %xmm0, %xmm0, %xmm0
        vcvtss2sd       f(%rip), %xmm0, %xmm0
        vmovsd  %xmm0, d(%rip)
        vxorps  %xmm0, %xmm0, %xmm0
        vcvtsi2ss       i(%rip), %xmm0, %xmm0
        vmovss  %xmm0, f(%rip)
        ret

instead of

        vcvtss2sd       f(%rip), %xmm0, %xmm0
        vmovsd  %xmm0, d(%rip)
        vcvtsi2ss       i(%rip), %xmm0, %xmm0
        vmovss  %xmm0, f(%rip)
        ret

One "vxorpd %xmm0, %xmm0, %xmm0" is necessary.  But both

vxorps  %xmm0, %xmm0, %xmm0

and

vxorps  %xmm0, %xmm0, %xmm0

are bad for performance.

Reply via email to