https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92651

            Bug ID: 92651
           Summary: [10 Regression] Unnecessary stv transform in some x86
                    backend
           Product: gcc
           Version: 10.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wwwhhhyyy333 at gmail dot com
                CC: rguenther at suse dot de
  Target Milestone: ---

For test case

#include <math.h>

int foo(unsigned char a, unsigned char b)
{
    int isum=abs(a - b);
    return isum;
}

Using -O2 -march=corei7 GCC generates:

        movzx   edi, dil
        movzx   esi, sil
        movd    xmm1, edi
        movd    xmm0, esi
        movdqa  xmm3, xmm1
        psubd   xmm3, xmm0
        psubd   xmm0, xmm1
        pmaxsd  xmm0, xmm3
        movd    eax, xmm0
        ret

while on -O2 -march=x86-64 it will be:

        movzx   eax, dil
        movzx   esi, sil
        sub     eax, esi
        cdq
        xor     eax, edx
        sub     eax, edx
        ret

On other case using -O2 -march=corei7 -mtune=generic:

        movzx   edi, dil
        movzx   esi, sil
        mov     eax, edi
        sub     eax, esi
        sub     esi, edi
        cmp     eax, esi
        cmovl   eax, esi
        ret

This happens since r277481. (Refers to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91154). In STV2 the transform was
executed since the  sse_to_integer RTL cost for corei7 is 2, which made the
conversion worthwhile for some cmove instructions. I think it affects most IA
processors with such kind of cost.

The stv conversion results in about 7% regression on 525.x264_r. I wonder if
the conversion is designed on purpose to handle cmove, if not I think it is
better to adjust sse_to_integer RTL cost to avoid such issue. According to my
experiment, 6 would be a proper value.

Reply via email to