https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979
--- Comment #9 from Paul Caprioli <paul at hpkfft dot com> --- I would like to offer two suggestions for your consideration: 1. For the reason of improving accuracy, the mul_fma code should be generated for complex operator* regardless of how the flag -ffp-contract is set. The source code author has already "contracted" the expression into something as small as possible, a single *. The compiler needs to expand this into two floating-point instructions, but it should not expand it further into three by emitting mul_mul_addsub. The user's setting -ffp-contract=off does not imply a requirement for greater expansion. (Intuitively, more floating-point roundoff-error-producing instructions worsen the final accuracy of the complex product.) 2. When compiling half precision with -mavx512fp16, the VFMULCPH instruction should be emitted. The Intel SDM (https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html) specifies that this instruction is implemented in hardware as mul_fma. Again, this instruction should be used regardless of how -ffp-contract is set. An example is here: https://godbolt.org/z/46hqP78GP