https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010

            Bug ID: 104010
           Summary: [12 regression] short loop no longer vectorized with
                    Neon after r12-6513
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: clyon at gcc dot gnu.org
  Target Milestone: ---

This short loop:
void test_vcmpeq_s32x2 (int32_t * __restrict__ dest, int32_t *a, int32_t *b)
{
  int i;
  for (i=0; i<4; i++) {
    dest[i] = a[i] == b[i];
  }
}

used to be vectorized as:
test_vcmpeq_s32x2:
        vld1.32 {d16}, [r1]
        vmov.i32        d17, #0x1  @ v2si
        vld1.32 {d19}, [r2]
        vmov.i32        d18, #0  @ v2si
        vceq.i32        d16, d16, d19
        vbsl    d16, d17, d18
        vst1.32 {d16}, [r0]
        bx      lr

After r12-6513, we get:
test_vcmpeq_s32x2:
        ldr     ip, [r1]
        ldr     r3, [r1, #4]
        str     lr, [sp, #-4]!
        ldr     lr, [r2]
        ldr     r2, [r2, #4]
        sub     ip, ip, lr
        clz     ip, ip
        sub     r3, r3, r2
        lsr     ip, ip, #5
        clz     r3, r3
        lsr     r3, r3, #5
        str     ip, [r0]
        str     r3, [r0, #4]
        ldr     pc, [sp], #4

when compiling for arm-none-linux-gnueabihf with -mcpu=cortex-a9 -mfpu=neon

Reply via email to