https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100627

--- Comment #2 from g.peterh...@t-online.de ---
Hello,
i found a better solution here
https://stackoverflow.com/questions/41144668/how-to-efficiently-perform-double-int64-conversions-with-sse-avx
and ported to "normal" C++-code (no intrinsics)
https://godbolt.org/z/scjEdze99. This has these advantages:
- constexpr
- flexible - can be vectorized (autovectorization)

These implementations require C++20 (std::bit_cast and constexpr std::exp2),
but can easily be implemented with older C++ versions. Possibly this trick can
also be used on s/uint64 -> float32, so that one saves the detour s/uint64 ->
float64 -> float32.

However, i have stated:
- with -march=skylake-avx512 no AVX512 code is generated
- only with -march=skylake-avx512 -mprefer-vector-width=512 or -mavx512f
-mavx512dq -mavx512vl does that work
- for s/uint64 -> float32 no correct AVX512 code is generated either
(_mm512_cvtepi64_ps, _mm512_cvtepu64_ps)

thx
Gero

Reply via email to