On Tue, Nov 5, 2024 at 5:19 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Tue, Nov 05, 2024 at 05:12:56PM +0800, Hongtao Liu wrote: > > Yes, there's a mismatch between scalar and vector code, I assume users > > may not care much about precision/NAN/INF/denormal behaviors for > > vector code. > > Just like we support > > #define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) > > but turn off > > RECIP_MASK_DIV | RECIP_MASK_SQRT. > > Users who don't care should be using -ffast-math. Users who do care > should get proper behavior. > > > > I don't know what exactly the hw instructions do, whether they perform > > > everything needed properly or just subset of it or none of it, > > > > Subset of it, hw instruction doesn't raise exceptions and always round > > to nearest (even). Output denormals are always flushed to zero and > > input denormals are always treated as zero. MXCSR is not consulted nor > > updated. > > Does it turn the sNaNs into infinities or qNaNs silently? Yes. > Given the rounding, flag_rounding_math should avoid the hw instructions, The default rounding mode for flag_rounding_math is rounding to nearest, so I assume !flag_rounding_math is not needed for the condition.
> and either HONOR_NANS or HONOR_SNANS should be used to predicate that. > > > > but the permutation fallback IMHO definitely needs to be guarded with > > > the same flags as scalar code. > > > For HONOR_NANS case or flag_rounding_math, the generic code (see expr.cc) > > > uses the libgcc fallback. Otherwise, generic code has > > > /* If we don't expect qNaNs nor sNaNs and can assume rounding > > > to nearest, we can expand the conversion inline as > > > (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16. */ > > > and the backend has > > > TARGET_SSE2 && flag_unsafe_math_optimizations && !HONOR_NANS (BFmode) > > > shift (i.e. just the permutation). > > > Note, even that (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16 > > > is doable in vectors. > > > > If you're concerned about that, I'll commit another patch to align the > > condition of the vector expander with scalar ones for both extendmn2 > > and truncmn2. > > For the fallback, for HONOR_NANS or flag_rounding_math we just shouldn't > use the fallback at all. For flag_unsafe_math_optimizations, we can just > use the simple permutation, i.ew. fromi >> 16, otherwise can use that > (fromi + 0x7fff + ((fromi >> 16) & 1) followed by the permutation. > > Jakub > -- BR, Hongtao