https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97903
Bug ID: 97903
Summary: [ARM NEON] Missed optimization in lowering test
operation
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: prathamesh3492 at gcc dot gnu.org
Target Milestone: ---
Hi,
For the following test-case:
#include <arm_neon.h>
uint8x8_t f1(int8x8_t a, int8x8_t b) {
return (uint8x8_t) ((a & b) != 0);
}
uint8x8_t f2(int8x8_t a, int8x8_t b) {
return vtst_s8 (a, b);
}
Code-gen:
f2:
vtst.8 d0, d0, d1
bx lr
f1:
vmov.i32 d16, #0 @ v8qi
vand d1, d0, d1
vmov.i32 d17, #0xffffffff @ v8qi
vceq.i8 d1, d1, d16
vbsl d1, d16, d17
vmov d0, d1 @ v8qi
bx lr
The optimized dump for f1 shows:
_1 = a_4(D) & b_5(D);
_3 = .VCOND (_1, { 0, 0, 0, 0, 0, 0, 0, 0 }, { -1, -1, -1, -1, -1, -1, -1, -1
}, { 0, 0, 0, 0, 0, 0, 0, 0 }, 113);
_6 = VIEW_CONVERT_EXPR<uint8x8_t>(_3);
I think we miss opportunity to combine AND followed by VCOND into a vector test
instruction. Should we add a .VTEST internal function that expands to vtst ? Or
alternatively, add a peephole pattern in backend ?
Thanks,
Prathamesh