Hi,

As subject, this patch rewrites the floating-point vml[as][q] Neon intrinsics
to use RTL builtins rather than relying on the GCC vector extensions.
Using RTL builtins allows control over the emission of fmla/fmls
instructions (which we don't want here.)

With this commit, the code generated by these intrinsics changes from
a fused multiply-add/subtract instruction to an fmul followed by an
fadd/fsub instruction. If the programmer really wants fmla/fmls
instructions, they can use the vfm[as] intrinsics.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-02-16  Jonathan Wright  <jonathan.wri...@arm.com>

        * config/aarch64/aarch64-simd-builtins.def: Add float_ml[as]
        builtin generator macros.
        * config/aarch64/aarch64-simd.md (aarch64_float_mla<mode>):
        Define.
        (aarch64_float_mls<mode>): Define.
        * config/aarch64/arm_neon.h (vmla_f32): Use RTL builtin
        instead of relying on GCC vector extensions.
        (vmla_f64): Likewise.
        (vmlaq_f32): Likewise.
        (vmlaq_f64): Likewise.
        (vmls_f32): Likewise.
        (vmls_f64): Likewise.
        (vmlsq_f32): Likewise.
        (vmlsq_f64): Likewise.
        * config/aarch64/iterators.md: Define VDQF_DF mode iterator.

Attachment: rb14211.patch
Description: rb14211.patch

Reply via email to