Hi, As subject, this patch uses __builtin_memcpy to copy vector structures instead of using a union - or constructing a new opaque structure one vector at a time - in each of the vst4[q]_lane Neon intrinsics in arm_neon.h.
It also adds new code generation tests to verify that superfluous move instructions are not generated for the vst4q_lane intrinsics. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-07-29 Jonathan Wright <jonathan.wri...@arm.com> * config/aarch64/arm_neon.h (__ST4_LANE_FUNC): Delete. (__ST4Q_LANE_FUNC): Delete. (vst4_lane_f16): Use __builtin_memcpy to copy vector structure instead of constructing __builtin_aarch64_simd_xi one vector at a time. (vst4_lane_f32): Likewise. (vst4_lane_f64): Likewise. (vst4_lane_p8): Likewise. (vst4_lane_p16): Likewise. (vst4_lane_p64): Likewise. (vst4_lane_s8): Likewise. (vst4_lane_s16): Likewise. (vst4_lane_s32): Likewise. (vst4_lane_s64): Likewise. (vst4_lane_u8): Likewise. (vst4_lane_u16): Likewise. (vst4_lane_u32): Likewise. (vst4_lane_u64): Likewise. (vst4_lane_bf16): Likewise. (vst4q_lane_f16): Use __builtin_memcpy to copy vector structure instead of using a union. (vst4q_lane_f32): Likewise. (vst4q_lane_f64): Likewise. (vst4q_lane_p8): Likewise. (vst4q_lane_p16): Likewise. (vst4q_lane_p64): Likewise. (vst4q_lane_s8): Likewise. (vst4q_lane_s16): Likewise. (vst4q_lane_s32): Likewise. (vst4q_lane_s64): Likewise. (vst4q_lane_u8): Likewise. (vst4q_lane_u16): Likewise. (vst4q_lane_u32): Likewise. (vst4q_lane_u64): Likewise. (vst4q_lane_bf16): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vector_structure_intrinsics.c: Add new tests.
rb14728.patch
Description: rb14728.patch