On Fri, Jul 23, 2021 at 10:22 AM Jonathan Wright via Gcc-patches < gcc-patches@gcc.gnu.org> wrote:
> Hi, > > This patch uses __builtin_memcpy to copy vector structures instead of > building a new opaque structure one vector at a time in each of the > vqtbl[234] Neon intrinsics in arm_neon.h. This simplifies the header file > and also improves code generation - superfluous move instructions > were emitted for every register extraction/set in this additional > structure. > > Add new code generation tests to verify that superfluous move > instructions are no longer generated for the vqtbl[234] intrinsics. > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > issues. > > Ok for master? > > Thanks, > Jonathan > > --- > > gcc/ChangeLog: > > 2021-07-08 Jonathan Wright <jonathan.wri...@arm.com> > > * config/aarch64/arm_neon.h (vqtbl2_s8): Use __builtin_memcpy > instead of constructing __builtin_aarch64_simd_oi one vector > at a time. > (vqtbl2_u8): Likewise. > (vqtbl2_p8): Likewise. > (vqtbl2q_s8): Likewise. > (vqtbl2q_u8): Likewise. > (vqtbl2q_p8): Likewise. > (vqtbl3_s8): Use __builtin_memcpy instead of constructing > __builtin_aarch64_simd_ci one vector at a time. > (vqtbl3_u8): Likewise. > (vqtbl3_p8): Likewise. > (vqtbl3q_s8): Likewise. > (vqtbl3q_u8): Likewise. > (vqtbl3q_p8): Likewise. > (vqtbl4_s8): Use __builtin_memcpy instead of constructing > __builtin_aarch64_simd_xi one vector at a time. > (vqtbl4_u8): Likewise. > (vqtbl4_p8): Likewise. > (vqtbl4q_s8): Likewise. > (vqtbl4q_u8): Likewise. > (vqtbl4q_p8): Likewise. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vector_structure_intrinsics.c: New test. > Hi, This new test fails on aarch64_be: FAIL: gcc.target/aarch64/vector_structure_intrinsics.c scan-assembler-not mov\\t Can you check? Thanks Christophe