Hi! On Sat, Apr 08, 2023 at 09:34:51AM -0400, Michael Meissner wrote: > The Altivec instructions vmaddfp and vnmsubfp have different rounding > behaviors > than the VSX xvmaddsp and xvnmsubsp instructions. In particular, generating > these instructions seems to break Eigen on big endian systems.
What actually breaks Eigen is not the rounding behaviour (it runs with RN=0b00, like most things do, so round-to-nearest-ties-to-even, the only rounding mode supported by the VMX float insns, no big shocking surprise there). What break Eigen and many other unsuspecting programs unknowingly using VMX is that on Linux programs are started with VSCR[NJ]=1, "Non-Java mode", which means all numbers with unbiased exponent 0 get the mantissa forced to 0 as well, both on input and output (all denormals are flushed to zero of the same sign). This is counter to the various ABIs. I'll submit a patch to Linux soon. But since many people run older kernels, at least for a while more, we need to fix this in GCC. Like your patch does. > PR target/70243 > * config/rs6000/rs6000.md (vsx_fmav4sf4): Do not generate vmaddfp. > (vsx_nfmsv4sf4): Do not generate vnmsubfp. > -;; Fused vector multiply/add instructions. Support the classical Altivec > -;; versions of fma, which allows the target to be a separate register from > the > -;; 3 inputs. Under VSX, the target must be either the addend or the first > -;; multiply. > - > +;; Fused vector multiply/add instructions. Do not generate the Altivec > versions > +;; of fma (vmaddfp and vnmsubfp). These instructions allows the target to > be a > +;; separate register from the 3 inputs, but they have different rounding > +;; behaviors than the VSX instructions. Please mention the VSCR[NJ] thing here as well? Just something very short, just mentioning "NJ" or "Non-Java" is enough. With that: okay for trunk, thank you! Also okay for all backports. Segher