Kyrylo Tkachov <kyrylo.tkac...@arm.com> writes:
> Hi Jonathan,
>
>> -----Original Message-----
>> From: Jonathan Wright <jonathan.wri...@arm.com>
>> Sent: 27 January 2021 16:03
>> To: gcc-patches@gcc.gnu.org
>> Cc: Kyrylo Tkachov <kyrylo.tkac...@arm.com>
>> Subject: [PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n
>> intrinsics
>>
>> Hi,
>>
>> As subject, this patch rewrites floating-point mla_n/mls_n intrinsics to use
>> a + b * c / a - b * c rather than inline assembly code, allowing for better
>> scheduling and optimization.
>>
>> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
>> issues.
>>
>> Ok for master?
>
> I'm quite keen to remove that ugly inline asm, but I'm a bit concerned about 
> the floating-point semantics now being affected by things like FP 
> contractions.
> The intrinsics are supposed to preserve the semantics of the instructions as 
> much as possible.
> Richard, does this mean we'll want to implement this using RTL builtins, like 
> for the integer ones?

It seems like a grey area in the spec.  E.g. vmlaq_f32 is described as:

    RESULT[I] = a[i] + (b[i] * c[i]) for i = 0 to 3

which could be taken to mean that it behaves in the same way as the
C arithmetic would, and so should be subject to -ffp-contract.

At the moment, a separate vmulq_f32 and vaddq_f32 could be fused,
but that's arguably a bug, since the spec says that they should
behave like FMUL and FADD respectively.  So:

* At the moment, vmla_* is the only way of forcibly disabling fusing.

* -ffp-contract has different defaults between Clang and GCC,
  and the default GCC behaviour would be to contract the vmlas.

* It would be a change in behaviour from previous releases.

So I agree we should probably use builtins.

We'd need to be careful that we don't grow define_insns or RTL
optimisations that do their own fusing of separate MULTs and ADDs.
I think we should have new tests to make sure that we generate
separate FMULs and FADDs, if we don't already.

Thanks,
Richard

Reply via email to