https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100627
--- Comment #2 from g.peterh...@t-online.de --- Hello, i found a better solution here https://stackoverflow.com/questions/41144668/how-to-efficiently-perform-double-int64-conversions-with-sse-avx and ported to "normal" C++-code (no intrinsics) https://godbolt.org/z/scjEdze99. This has these advantages: - constexpr - flexible - can be vectorized (autovectorization) These implementations require C++20 (std::bit_cast and constexpr std::exp2), but can easily be implemented with older C++ versions. Possibly this trick can also be used on s/uint64 -> float32, so that one saves the detour s/uint64 -> float64 -> float32. However, i have stated: - with -march=skylake-avx512 no AVX512 code is generated - only with -march=skylake-avx512 -mprefer-vector-width=512 or -mavx512f -mavx512dq -mavx512vl does that work - for s/uint64 -> float32 no correct AVX512 code is generated either (_mm512_cvtepi64_ps, _mm512_cvtepu64_ps) thx Gero