https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64477

            Bug ID: 64477
           Summary: x86 sse unnecessary GPR spill
           Product: gcc
           Version: 4.9.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: zoltan at hidvegi dot com

typedef signed char v16si __attribute__ ((vector_size (16)));
v16si ary(signed char a)
{
    return v16si{a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a};
}

Compiled with g++-4.9 -m64 -O2 -fomit-frame-pointer -Wall -I$HOME/dev/common
-mssse3 -std=gnu++11 -S xmm_test.C

I get

        pxor    %xmm1, %xmm1
        movd    %edi, %xmm0
        movl    %edi, -12(%rsp)
        pshufb  %xmm1, %xmm0
        ret

Note the unnecessary spill of edi, with gcc-4.8 this does not happen, so you
may consider this a regression. I think this may happen because it first tries
to move from gpr to xmm via the stack, but later optimizes to a direct gpr to
xmm move, but the stack spill stays.

When using -march=corei7-avx and 4x4 int vector, gcc-4.9 uses store to stack
and vbroadcastss instead of movd and pshufd  $0, %xmm1, %xmm0 used by gcc-4.8,
again gcc-4.8 seems better to me. But even gcc-4.8 goes through the stack in
that case with -mtune=generic

Reply via email to