Updated the patch to be more consistent with the others in the series. Tested and bootstrapped on aarch64-none-linux-gnu - no issues.
Ok for master? Thanks, Jonathan ________________________________ From: Gcc-patches <gcc-patches-boun...@gcc.gnu.org> on behalf of Jonathan Wright via Gcc-patches <gcc-patches@gcc.gnu.org> Sent: 28 April 2021 15:42 To: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org> Subject: [PATCH 13/20] aarch64: Use RTL builtins for FP ml[as][q]_laneq intrinsics Hi, As subject, this patch rewrites the floating-point vml[as][q]_laneq Neon intrinsics to use RTL builtins rather than relying on the GCC vector extensions. Using RTL builtins allows control over the emission of fmla/fmls instructions (which we don't want here.) With this commit, the code generated by these intrinsics changes from a fused multiply-add/subtract instruction to an fmul followed by an fadd/fsub instruction. If the programmer really wants fmla/fmls instructions, they can use the vfm[as] intrinsics. Regression tested and bootstrapped on aarch64-none-linux-gnu - no issues. Ok for master? Thanks, Jonathan --- gcc/ChangeLog: 2021-02-17 Jonathan Wright <jonathan.wri...@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add float_ml[as][q]_laneq builtin generator macros. * config/aarch64/aarch64-simd.md (mul_laneq<mode>3): Define. (aarch64_float_mla_laneq<mode>): Define. (aarch64_float_mls_laneq<mode>): Define. * config/aarch64/arm_neon.h (vmla_laneq_f32): Use RTL builtin instead of GCC vector extensions. (vmlaq_laneq_f32): Likewise. (vmls_laneq_f32): Likewise. (vmlsq_laneq_f32): Likewise.
rb14213.patch
Description: rb14213.patch