On Tue, May 10, 2022 at 02:56:58PM -0400, Michael Meissner wrote: > On Tue, May 10, 2022 at 07:27:30AM -0500, Segher Boessenkool wrote: > > > IMHO, it's something we want to fix as well, based on the reasons: > > > 1) bif names have the corresponding mnemonics, users would expect 1-1 > > > mapping here. > > > 2) clang emits xs{min,max}dp all the time, with cpu type power7/8/9/10. > > > 3) according to uarch info, xs{min,max}cdp use the same units and have > > > the same latency, > > > no benefits to replace with xs{min,max}cdp. > > > > I never understood any of this. Mike? Why do we do those "c" things > > at all, ever? > > In the power7, we only had x{s,v}{min,max}{sp,dp}. But those aren't useful > for > optimizing normal (a > b) ? a : b without using -ffast-math.
But RTL smin (as well as Gimple min_expr) is *undefined* without -ffast-math (well, -ffinite-math-only and -fno-signed-zeros at least). The only place we generate xs{min,max}[c]dp uses s{min,max}. So the much saner xs{min,max}dp are fine always. > Power9 added the > 'c' and 'j' versions of the insns. GCC never generates the 'j' version. > > Basically for ?: we generate: > > * Code = power8, no -ffast-math: Generate compare, move; > * Code = power8, -ffast-math: Generate xsmaxdp/xsmindp; > * Code = power9, no -ffast-mth: Generate xsmaxcdp/xsmincdp; (and) This one uses broken RTL (and broken Gimple before that): s{min,max} cannot be used for FP without -ffast-math. > * Code = power9, -ffast-math: Generate xsmaxcdp/xsmincdp. xs{min,max}dp will work just as well. > For the __builtin_fmax and __builtin_fmin functions: > > * Code = power8, no -ffast-math: Generate call to fmax/fmin; > * Code = power8, -ffast-math: Generate xsmaxdp/xsmindp; > * Code = power9, no -ffast-mth: Generate call to fmax/fmin; (and) > * Code = power9, -ffast-math: Generate xsmaxcdp/xsmincdp. Same brokenness here. > For IEEE 128-bit floating point, we only have xs{min,max}cqp. We do not have > the version without 'c' nor do we have the 'j' version. And here. Why would we ever prefer xsmincdp over xsmindp, other than for machine code for some not-so-smart C code that wil not do useful things for signed zeros or NaNs (but using the "c" insns generates faster, smaller code that has those silly semantics)? Segher