On Tue, Nov 5, 2024 at 5:19 PM Jakub Jelinek <ja...@redhat.com> wrote:
>
> On Tue, Nov 05, 2024 at 05:12:56PM +0800, Hongtao Liu wrote:
> > Yes, there's a mismatch between scalar and vector code, I assume users
> > may not care much about precision/NAN/INF/denormal behaviors for
> > vector code.
> > Just like we support
> > #define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT)
> >  but turn off
> > RECIP_MASK_DIV | RECIP_MASK_SQRT.
>
> Users who don't care should be using -ffast-math.  Users who do care
> should get proper behavior.
>
> > > I don't know what exactly the hw instructions do, whether they perform
> > > everything needed properly or just subset of it or none of it,
> >
> > Subset of it, hw instruction doesn't raise exceptions and always round
> > to nearest (even). Output denormals are always flushed to zero and
> > input denormals are always treated as zero. MXCSR is not consulted nor
> > updated.
>
> Does it turn the sNaNs into infinities or qNaNs silently?
Yes.
> Given the rounding, flag_rounding_math should avoid the hw instructions,
The default rounding mode for flag_rounding_math is rounding to
nearest, so I assume !flag_rounding_math is not needed for the
condition.

> and either HONOR_NANS or HONOR_SNANS should be used to predicate that.
>
> > > but the permutation fallback IMHO definitely needs to be guarded with
> > > the same flags as scalar code.
> > > For HONOR_NANS case or flag_rounding_math, the generic code (see expr.cc)
> > > uses the libgcc fallback.  Otherwise, generic code has
> > >           /* If we don't expect qNaNs nor sNaNs and can assume rounding
> > >              to nearest, we can expand the conversion inline as
> > >              (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
> > > and the backend has
> > > TARGET_SSE2 && flag_unsafe_math_optimizations && !HONOR_NANS (BFmode)
> > > shift (i.e. just the permutation).
> > > Note, even that (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16
> > > is doable in vectors.
> >
> > If you're concerned about that, I'll commit another patch to align the
> > condition of the vector expander with scalar ones for both extendmn2
> > and truncmn2.
>
> For the fallback, for HONOR_NANS or flag_rounding_math we just shouldn't
> use the fallback at all.  For flag_unsafe_math_optimizations, we can just
> use the simple permutation, i.ew. fromi >> 16, otherwise can use that
> (fromi + 0x7fff + ((fromi >> 16) & 1) followed by the permutation.
>
>         Jakub
>


-- 
BR,
Hongtao

Reply via email to