Kyrylo Tkachov <kyrylo.tkac...@arm.com> writes: > Hi Jonathan, > >> -----Original Message----- >> From: Jonathan Wright <jonathan.wri...@arm.com> >> Sent: 27 January 2021 16:03 >> To: gcc-patches@gcc.gnu.org >> Cc: Kyrylo Tkachov <kyrylo.tkac...@arm.com> >> Subject: [PATCH] aarch64: Use GCC vector extensions for FP ml[as]_n >> intrinsics >> >> Hi, >> >> As subject, this patch rewrites floating-point mla_n/mls_n intrinsics to use >> a + b * c / a - b * c rather than inline assembly code, allowing for better >> scheduling and optimization. >> >> Regression tested and bootstrapped on aarch64-none-linux-gnu - no >> issues. >> >> Ok for master? > > I'm quite keen to remove that ugly inline asm, but I'm a bit concerned about > the floating-point semantics now being affected by things like FP > contractions. > The intrinsics are supposed to preserve the semantics of the instructions as > much as possible. > Richard, does this mean we'll want to implement this using RTL builtins, like > for the integer ones?
It seems like a grey area in the spec. E.g. vmlaq_f32 is described as: RESULT[I] = a[i] + (b[i] * c[i]) for i = 0 to 3 which could be taken to mean that it behaves in the same way as the C arithmetic would, and so should be subject to -ffp-contract. At the moment, a separate vmulq_f32 and vaddq_f32 could be fused, but that's arguably a bug, since the spec says that they should behave like FMUL and FADD respectively. So: * At the moment, vmla_* is the only way of forcibly disabling fusing. * -ffp-contract has different defaults between Clang and GCC, and the default GCC behaviour would be to contract the vmlas. * It would be a change in behaviour from previous releases. So I agree we should probably use builtins. We'd need to be careful that we don't grow define_insns or RTL optimisations that do their own fusing of separate MULTs and ADDs. I think we should have new tests to make sure that we generate separate FMULs and FADDs, if we don't already. Thanks, Richard