Re: [PATCH 13/20] aarch64: Use RTL builtins for FP ml[as][q]_laneq intrinsics

Jonathan Wright via Gcc-patches Fri, 30 Apr 2021 09:30:07 -0700

Updated the patch to be more consistent with the others in the series.

Tested and bootstrapped on aarch64-none-linux-gnu - no issues.

Ok for master?

Thanks,
Jonathan
________________________________
From: Gcc-patches <gcc-patches-boun...@gcc.gnu.org> on behalf of Jonathan 
Wright via Gcc-patches <gcc-patches@gcc.gnu.org>
Sent: 28 April 2021 15:42
To: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>
Subject: [PATCH 13/20] aarch64: Use RTL builtins for FP ml[as][q]_laneq 
intrinsics

Hi,

As subject, this patch rewrites the floating-point vml[as][q]_laneq Neon
intrinsics to use RTL builtins rather than relying on the GCC vector
extensions. Using RTL builtins allows control over the emission of
fmla/fmls instructions (which we don't want here.)

With this commit, the code generated by these intrinsics changes from
a fused multiply-add/subtract instruction to an fmul followed by an
fadd/fsub instruction. If the programmer really wants fmla/fmls
instructions, they can use the vfm[as] intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-02-17  Jonathan Wright  <jonathan.wri...@arm.com>

        * config/aarch64/aarch64-simd-builtins.def: Add
        float_ml[as][q]_laneq builtin generator macros.
        * config/aarch64/aarch64-simd.md (mul_laneq<mode>3): Define.
        (aarch64_float_mla_laneq<mode>): Define.
        (aarch64_float_mls_laneq<mode>): Define.
        * config/aarch64/arm_neon.h (vmla_laneq_f32): Use RTL builtin
        instead of GCC vector extensions.
        (vmlaq_laneq_f32): Likewise.
        (vmls_laneq_f32): Likewise.
        (vmlsq_laneq_f32): Likewise.

rb14213.patch
Description: rb14213.patch

Re: [PATCH 13/20] aarch64: Use RTL builtins for FP ml[as][q]_laneq intrinsics

Reply via email to