On Tue, May 10, 2022 at 02:56:58PM -0400, Michael Meissner wrote:
> On Tue, May 10, 2022 at 07:27:30AM -0500, Segher Boessenkool wrote:
> > > IMHO, it's something we want to fix as well, based on the reasons:
> > >   1) bif names have the corresponding mnemonics, users would expect 1-1 
> > > mapping here.
> > >   2) clang emits xs{min,max}dp all the time, with cpu type power7/8/9/10.
> > >   3) according to uarch info, xs{min,max}cdp use the same units and have 
> > > the same latency,
> > >      no benefits to replace with xs{min,max}cdp.
> > 
> > I never understood any of this.  Mike?  Why do we do those "c" things
> > at all, ever?
> 
> In the power7, we only had x{s,v}{min,max}{sp,dp}.  But those aren't useful 
> for
> optimizing normal (a > b) ? a : b without using -ffast-math.

But RTL smin (as well as Gimple min_expr) is *undefined* without
-ffast-math (well, -ffinite-math-only and -fno-signed-zeros at least).
The only place we generate xs{min,max}[c]dp uses s{min,max}.  So the
much saner xs{min,max}dp are fine always.

> Power9 added the
> 'c' and 'j' versions of the insns.  GCC never generates the 'j' version.
> 
> Basically for ?: we generate:
> 
>     * Code = power8, no -ffast-math:    Generate compare, move;
>     * Code = power8, -ffast-math:       Generate xsmaxdp/xsmindp;
>     * Code = power9, no -ffast-mth:     Generate xsmaxcdp/xsmincdp; (and)

This one uses broken RTL (and broken Gimple before that): s{min,max}
cannot be used for FP without -ffast-math.

>     * Code = power9, -ffast-math:       Generate xsmaxcdp/xsmincdp.

xs{min,max}dp will work just as well.

> For the __builtin_fmax and __builtin_fmin functions:
> 
>     * Code = power8, no -ffast-math:    Generate call to fmax/fmin;
>     * Code = power8, -ffast-math:       Generate xsmaxdp/xsmindp;
>     * Code = power9, no -ffast-mth:     Generate call to fmax/fmin; (and)
>     * Code = power9, -ffast-math:       Generate xsmaxcdp/xsmincdp.

Same brokenness here.

> For IEEE 128-bit floating point, we only have xs{min,max}cqp.  We do not have
> the version without 'c' nor do we have the 'j' version.

And here.

Why would we ever prefer xsmincdp over xsmindp, other than for machine
code for some not-so-smart C code that wil not do useful things for
signed zeros or NaNs (but using the "c" insns generates faster, smaller
code that has those silly semantics)?


Segher

Reply via email to