15 Regression] x86: poor code generation with 16 byte function arguments and addition

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 08 Aug 2024 02:52:28 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116274


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
struct a { int x,y,z,w; };
int test(struct a a) { return a.x+a.y+a.z+a.w; }

behaves similarly.

I do have a patch for the vectorizer costing that avoids vectorizing in
these cases.  We will still vectorize

struct a { short a0,a1,a2,a3,a4,a5,a6,a7; };
short test(struct a a) { return a.a0+a.a1+a.a2+a.a3+a.a4+a.a5+a.a6+a.a7; }

generating

test:
.LFB0:
        .cfi_startproc
        movaps  %xmm1, -24(%rsp)
        movq    -16(%rsp), %rdx
        movq    %rdi, %xmm1
        movq    %rsi, %xmm3
        pinsrq  $1, %rdx, %xmm1
        punpcklqdq      %xmm3, %xmm1
        movaps  %xmm1, -24(%rsp)
        movdqa  %xmm1, %xmm2
        pinsrq  $1, -16(%rsp), %xmm2
        movdqa  %xmm2, %xmm0
        psrldq  $8, %xmm0
        paddw   %xmm1, %xmm0
        movdqa  %xmm0, %xmm1
        psrldq  $4, %xmm1
        paddw   %xmm1, %xmm0
        movdqa  %xmm0, %xmm1
        psrldq  $2, %xmm1
        paddw   %xmm1, %xmm0
        pextrw  $0, %xmm0, %eax
        ret

as opposed to

test:
.LFB0:
        .cfi_startproc
        movl    %edi, %eax
        movq    %rdi, %rdx
        sarl    $16, %eax
        salq    $16, %rdx
        addl    %edi, %eax
        sarq    $48, %rdx
        addl    %edx, %eax
        sarq    $48, %rdi
        movl    %esi, %edx
        addl    %edi, %eax
        sarl    $16, %edx
        addl    %esi, %eax
        addl    %edx, %eax
        movq    %rsi, %rdx
        sarq    $48, %rsi
        salq    $16, %rdx
        sarq    $48, %rdx
        addl    %edx, %eax
        addl    %esi, %eax
        ret

it still has the odd (dead)

        movaps  %xmm1, -24(%rsp)
        movq    -16(%rsp), %rdx

The

        movaps  %xmm1, -24(%rsp)
        movdqa  %xmm1, %xmm2
        pinsrq  $1, -16(%rsp), %xmm2

codegen is probably an RA/LRA artifact caused by bad instruction constraints
and the refuse to reload to a gpr.  Not sure if a move high to gpr is a thing,
pextrq would work for sure.  But an unpck looks like a better match anyway.

[Bug target/116274] [14/15 Regression] x86: poor code generation with 16 byte function arguments and addition

Reply via email to