On Tue, Oct 29, 2024 at 07:19:38PM -0700, liuhongt wrote: > Generate native instruction whenever possible, otherwise use vector > permutation with odd indices. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready push to trunk. > > gcc/ChangeLog: > > * config/i386/i386-expand.cc > (ix86_expand_vector_sf2bf_with_vec_perm): New function. > * config/i386/i386-protos.h > (ix86_expand_vector_sf2bf_with_vec_perm): New declare. > * config/i386/mmx.md (truncv2sfv2bf2): New expander. > * config/i386/sse.md (truncv4sfv4bf2): Ditto. > (truncv8sfv8bf2): Ditto. > (truncv16sfv16bf2): Ditto. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx512bf16-truncsfbf.c: New test. > * gcc.target/i386/avx512bw-truncsfbf.c: New test. > * gcc.target/i386/ssse3-truncsfbf.c: New test.
Is that correct for non-ffast-math? I mean, truncation from SF to BFmode e.g. when honoring NaNs definitely isn't a simple permutation. A SFmode sNaN which has non-zero bits in the mantissa only in the lower 16-bits would be silently turned into +-Inf rather than raise exception and turn it into a qNaN. Similarly, the result when not using -ffast-math needs to be correctly rounded (according to the current rounding mode, at least with -frounding-math, otherwise at least for round to even), permutation definitely doesn't achieve that. I don't know what exactly the hw instructions do, whether they perform everything needed properly or just subset of it or none of it, but the permutation fallback IMHO definitely needs to be guarded with the same flags as scalar code. For HONOR_NANS case or flag_rounding_math, the generic code (see expr.cc) uses the libgcc fallback. Otherwise, generic code has /* If we don't expect qNaNs nor sNaNs and can assume rounding to nearest, we can expand the conversion inline as (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16. */ and the backend has TARGET_SSE2 && flag_unsafe_math_optimizations && !HONOR_NANS (BFmode) shift (i.e. just the permutation). Note, even that (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16 is doable in vectors. Jakub