max]dp [PR103605]

Segher Boessenkool Tue, 10 May 2022 13:25:10 -0700

On Tue, May 10, 2022 at 02:56:58PM -0400, Michael Meissner wrote:
> On Tue, May 10, 2022 at 07:27:30AM -0500, Segher Boessenkool wrote:
> > > IMHO, it's something we want to fix as well, based on the reasons:
> > >   1) bif names have the corresponding mnemonics, users would expect 1-1 
> > > mapping here.
> > >   2) clang emits xs{min,max}dp all the time, with cpu type power7/8/9/10.
> > >   3) according to uarch info, xs{min,max}cdp use the same units and have 
> > > the same latency,
> > >      no benefits to replace with xs{min,max}cdp.
> > 
> > I never understood any of this.  Mike?  Why do we do those "c" things
> > at all, ever?
> 
> In the power7, we only had x{s,v}{min,max}{sp,dp}.  But those aren't useful 
> for
> optimizing normal (a > b) ? a : b without using -ffast-math.


But RTL smin (as well as Gimple min_expr) is *undefined* without
-ffast-math (well, -ffinite-math-only and -fno-signed-zeros at least).
The only place we generate xs{min,max}[c]dp uses s{min,max}.  So the
much saner xs{min,max}dp are fine always.

> Power9 added the
> 'c' and 'j' versions of the insns.  GCC never generates the 'j' version.
> 
> Basically for ?: we generate:
> 
>     * Code = power8, no -ffast-math:    Generate compare, move;
>     * Code = power8, -ffast-math:       Generate xsmaxdp/xsmindp;
>     * Code = power9, no -ffast-mth:     Generate xsmaxcdp/xsmincdp; (and)

This one uses broken RTL (and broken Gimple before that): s{min,max}
cannot be used for FP without -ffast-math.

>     * Code = power9, -ffast-math:       Generate xsmaxcdp/xsmincdp.

xs{min,max}dp will work just as well.

> For the __builtin_fmax and __builtin_fmin functions:
> 
>     * Code = power8, no -ffast-math:    Generate call to fmax/fmin;
>     * Code = power8, -ffast-math:       Generate xsmaxdp/xsmindp;
>     * Code = power9, no -ffast-mth:     Generate call to fmax/fmin; (and)
>     * Code = power9, -ffast-math:       Generate xsmaxcdp/xsmincdp.

Same brokenness here.

> For IEEE 128-bit floating point, we only have xs{min,max}cqp.  We do not have
> the version without 'c' nor do we have the 'j' version.

And here.

Why would we ever prefer xsmincdp over xsmindp, other than for machine
code for some not-so-smart C code that wil not do useful things for
signed zeros or NaNs (but using the "c" insns generates faster, smaller
code that has those silly semantics)?


Segher

Re: [PATCH, rs6000] Implemented f[min/max]_optab by xs[min/max]dp [PR103605]

Reply via email to