https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91594

            Bug ID: 91594
           Summary: Missing horizontal addition auto-vectorization
           Product: gcc
           Version: 9.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: diegoandres91b at hotmail dot com
  Target Milestone: ---

The next code (with -O3 -ffast-math -msse3):

float a2[4], b2[4], c2[4];

void hadd2() {
    c2[0] = a2[0] + a2[1];
    c2[1] = a2[2] + a2[3];
    c2[2] = b2[0] + b2[1];
    c2[3] = b2[2] + b2[3];
}

Compiles without auto-vectorization:

hadd2():
        movss   xmm0, DWORD PTR a2[rip]
        addss   xmm0, DWORD PTR a2[rip+4]
        movss   DWORD PTR c2[rip], xmm0
        movss   xmm0, DWORD PTR a2[rip+8]
        addss   xmm0, DWORD PTR a2[rip+12]
        movss   DWORD PTR c2[rip+4], xmm0
        movss   xmm0, DWORD PTR b2[rip]
        addss   xmm0, DWORD PTR b2[rip+4]
        movss   DWORD PTR c2[rip+8], xmm0
        movss   xmm0, DWORD PTR b2[rip+8]
        addss   xmm0, DWORD PTR b2[rip+12]
        movss   DWORD PTR c2[rip+12], xmm0
        ret

The expected code with HADDPS instruction (which does not compile):

hadd2():
        movaps  xmm0, XMMWORD PTR a1[rip]
        haddps  xmm0, XMMWORD PTR b1[rip]
        movaps  XMMWORD PTR c1[rip], xmm0
        ret

In contrast, the normal addition code:

void add2() {
    c2[0] = a2[0] + b2[0];
    c2[1] = a2[1] + b2[1];
    c2[2] = a2[2] + b2[2];
    c2[3] = a2[3] + b2[3];
}

Compiles with auto-vectorization:

add2():
        movaps  xmm0, XMMWORD PTR a2[rip]
        addps   xmm0, XMMWORD PTR b2[rip]
        movaps  XMMWORD PTR c2[rip], xmm0
        ret

Compiler Explorer Code: https://gcc.godbolt.org/z/9Hs9su

Reply via email to