https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979

--- Comment #9 from Paul Caprioli <paul at hpkfft dot com> ---
I would like to offer two suggestions for your consideration:

1. For the reason of improving accuracy, the mul_fma code should be generated
for complex operator* regardless of how the flag -ffp-contract is set.  The
source code author has already "contracted" the expression into something as
small as possible, a single *.
The compiler needs to expand this into two floating-point instructions, but it
should not expand it further into three by emitting mul_mul_addsub.  The user's
setting -ffp-contract=off does not imply a requirement for greater expansion. 
(Intuitively, more floating-point roundoff-error-producing instructions worsen
the final accuracy of the complex product.)

2. When compiling half precision with -mavx512fp16, the VFMULCPH instruction
should be emitted.  The Intel SDM
(https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)
specifies that this instruction is implemented in hardware as mul_fma.  Again,
this instruction should be used regardless of how -ffp-contract is set.
An example is here: https://godbolt.org/z/46hqP78GP

Reply via email to