On Tue, Oct 29, 2024 at 07:19:38PM -0700, liuhongt wrote:
> Generate native instruction whenever possible, otherwise use vector
> permutation with odd indices.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready push to trunk.
> 
> gcc/ChangeLog:
> 
>       * config/i386/i386-expand.cc
>       (ix86_expand_vector_sf2bf_with_vec_perm): New function.
>       * config/i386/i386-protos.h
>       (ix86_expand_vector_sf2bf_with_vec_perm): New declare.
>       * config/i386/mmx.md (truncv2sfv2bf2): New expander.
>       * config/i386/sse.md (truncv4sfv4bf2): Ditto.
>       (truncv8sfv8bf2): Ditto.
>       (truncv16sfv16bf2): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.target/i386/avx512bf16-truncsfbf.c: New test.
>       * gcc.target/i386/avx512bw-truncsfbf.c: New test.
>       * gcc.target/i386/ssse3-truncsfbf.c: New test.

Is that correct for non-ffast-math?
I mean, truncation from SF to BFmode e.g. when honoring NaNs definitely
isn't a simple permutation.
A SFmode sNaN which has non-zero bits in the mantissa only in the lower
16-bits would be silently turned into +-Inf rather than raise exception
and turn it into a qNaN.
Similarly, the result when not using -ffast-math needs to be correctly
rounded (according to the current rounding mode, at least with
-frounding-math, otherwise at least for round to even), permutation
definitely doesn't achieve that.

I don't know what exactly the hw instructions do, whether they perform
everything needed properly or just subset of it or none of it,
but the permutation fallback IMHO definitely needs to be guarded with
the same flags as scalar code.
For HONOR_NANS case or flag_rounding_math, the generic code (see expr.cc)
uses the libgcc fallback.  Otherwise, generic code has
          /* If we don't expect qNaNs nor sNaNs and can assume rounding
             to nearest, we can expand the conversion inline as
             (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
and the backend has
TARGET_SSE2 && flag_unsafe_math_optimizations && !HONOR_NANS (BFmode)
shift (i.e. just the permutation).
Note, even that (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16
is doable in vectors.

        Jakub

Reply via email to