On Tue, Nov 5, 2024 at 4:46 PM Jakub Jelinek <ja...@redhat.com> wrote: > > On Tue, Oct 29, 2024 at 07:19:38PM -0700, liuhongt wrote: > > Generate native instruction whenever possible, otherwise use vector > > permutation with odd indices. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready push to trunk. > > > > gcc/ChangeLog: > > > > * config/i386/i386-expand.cc > > (ix86_expand_vector_sf2bf_with_vec_perm): New function. > > * config/i386/i386-protos.h > > (ix86_expand_vector_sf2bf_with_vec_perm): New declare. > > * config/i386/mmx.md (truncv2sfv2bf2): New expander. > > * config/i386/sse.md (truncv4sfv4bf2): Ditto. > > (truncv8sfv8bf2): Ditto. > > (truncv16sfv16bf2): Ditto. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/avx512bf16-truncsfbf.c: New test. > > * gcc.target/i386/avx512bw-truncsfbf.c: New test. > > * gcc.target/i386/ssse3-truncsfbf.c: New test. > > Is that correct for non-ffast-math? > I mean, truncation from SF to BFmode e.g. when honoring NaNs definitely > isn't a simple permutation. > A SFmode sNaN which has non-zero bits in the mantissa only in the lower > 16-bits would be silently turned into +-Inf rather than raise exception > and turn it into a qNaN. > Similarly, the result when not using -ffast-math needs to be correctly > rounded (according to the current rounding mode, at least with > -frounding-math, otherwise at least for round to even), permutation > definitely doesn't achieve that.
Yes, there's a mismatch between scalar and vector code, I assume users may not care much about precision/NAN/INF/denormal behaviors for vector code. Just like we support #define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) but turn off RECIP_MASK_DIV | RECIP_MASK_SQRT. > > I don't know what exactly the hw instructions do, whether they perform > everything needed properly or just subset of it or none of it, Subset of it, hw instruction doesn't raise exceptions and always round to nearest (even). Output denormals are always flushed to zero and input denormals are always treated as zero. MXCSR is not consulted nor updated. > but the permutation fallback IMHO definitely needs to be guarded with > the same flags as scalar code. > For HONOR_NANS case or flag_rounding_math, the generic code (see expr.cc) > uses the libgcc fallback. Otherwise, generic code has > /* If we don't expect qNaNs nor sNaNs and can assume rounding > to nearest, we can expand the conversion inline as > (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16. */ > and the backend has > TARGET_SSE2 && flag_unsafe_math_optimizations && !HONOR_NANS (BFmode) > shift (i.e. just the permutation). > Note, even that (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16 > is doable in vectors. If you're concerned about that, I'll commit another patch to align the condition of the vector expander with scalar ones for both extendmn2 and truncmn2. > > Jakub > -- BR, Hongtao