https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104010
Bug ID: 104010 Summary: [12 regression] short loop no longer vectorized with Neon after r12-6513 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: clyon at gcc dot gnu.org Target Milestone: --- This short loop: void test_vcmpeq_s32x2 (int32_t * __restrict__ dest, int32_t *a, int32_t *b) { int i; for (i=0; i<4; i++) { dest[i] = a[i] == b[i]; } } used to be vectorized as: test_vcmpeq_s32x2: vld1.32 {d16}, [r1] vmov.i32 d17, #0x1 @ v2si vld1.32 {d19}, [r2] vmov.i32 d18, #0 @ v2si vceq.i32 d16, d16, d19 vbsl d16, d17, d18 vst1.32 {d16}, [r0] bx lr After r12-6513, we get: test_vcmpeq_s32x2: ldr ip, [r1] ldr r3, [r1, #4] str lr, [sp, #-4]! ldr lr, [r2] ldr r2, [r2, #4] sub ip, ip, lr clz ip, ip sub r3, r3, r2 lsr ip, ip, #5 clz r3, r3 lsr r3, r3, #5 str ip, [r0] str r3, [r0, #4] ldr pc, [sp], #4 when compiling for arm-none-linux-gnueabihf with -mcpu=cortex-a9 -mfpu=neon