On Sat, 8 Feb 2025 at 10:55, Matthew Malcomson <mmalcom...@nvidia.com> wrote: > > Hi Jonathan! > > Many thanks! Will learn the libstdc++ style eventually. > > I've run bootstrap & regression test on this, and did the manual checks > I mentioned before of compiling atomic_float/1.cc with clang and then > adding `+ 1` on the builtin codepath to check that the clang binary > aborts while the GCC built binary passes.
Great, thanks for checking it. I'll get this pushed early next week. > > Thanks again! > Matthew > > On 2/7/25 15:33, Jonathan Wakely wrote: > > External email: Use caution opening links or attachments > > > > > > On 05/02/25 13:43 +0000, Jonathan Wakely wrote: > >> On 28/10/24 17:15 +0000, mmalcom...@nvidia.com wrote: > >>> From: Matthew Malcomson <mmalcom...@nvidia.com> > >>> > >>> I noticed that the libstdc++ patch is essentially separate and figured I > >>> could send it upstream earlier to give reviewers more time to look at > >>> it. > >>> I am still working on adding the ability to use floating point types in > >>> the __atomic_fetch_add builtins > >>> > >>> Review of current state and motivation (for anyone reading this that has > >>> not already seen the previous patches): > >>> - Some hardware has support for floating point atomic fetch_add (and > >>> similar). > >>> - There are existing compilers targetting this hardware that use > >>> libstdc++ -- e.g. NVC++. > >>> - Since the libstdc++ atomic<float>::fetch_add and similar is written > >>> directly as a CAS loop these compilers can not emit optimal code when > >>> seeing such constructs. > >>> - I hope to use __atomic_fetch_add builtins on floating point types > >>> directly in libstdc++ so these compilers can emit better code. > >>> - Clang already handles some floating point types in the > >>> __atomic_fetch_add family of builtins. > >>> - In order to only use this when available, I originally thought I could > >>> check against the resolved versions of the builtin in a manner > >>> something like `__has_builtin(__atomic_fetch_add_<fp-suffix>)`. > >>> I then realised that clang does not expose resolved versions of these > >>> atomic builtins to the user. > >>> From the clang discourse it was suggested we instead use SFINAE (which > >>> clang already supports). > >>> - I have posted a patch for allowing the use of SFINAE on builtins (not > >>> yet reviewed). > >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html > >>> *This* patch builds and regtests on top of that patch. It does not > >>> change what happens for GCC, while it uses the builtin for codegen with > >>> clang. > >>> - I have previously sent a patchset upstream adding the ability to use > >>> __atomic_fetch_add and similar on floating point types. > >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663355.html > >>> I hope to send a full patchset up soon including the suggestions given > >>> there. > >>> With that patchset included (plus the automatic linking of libatomic > >>> as Joseph pointed out in the email below > >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665408.html ) > >>> then current GCC should start to use the builtin branch added in this > >>> patch. > >>> > >>> So *currently*, this patch allows external compilers (NVC++ in > >>> particular) to generate better code, and similarly lets clang understand > >>> the operation better since it maps to a known builtin. > >>> > >>> I hope that by GCC 15 this patch would also allow GCC to understand the > >>> operation better via mapping to a known builtin. > >>> > >>> ----------------- 8< ----------- >8 ---------------- > >>> > >>> Points to question here are: > >>> 1) Is this the best approach for using SFINAE to check if this builtin > >>> has a particular overload? > >>> Don't know of a better approach, but not an expert in C++ templating. > >> > >> Concepts! > >> > >>> We still need the CAS loop fallback for any compiler that doesn't > >>> implement this builtin. Once all compilers we care about implement this > >>> we can remove this special handling and merge the floating point and > >>> integral operations into the same template. > >>> > >>> Testing done: > >>> N.b. all testing done on top of the patch introducing SFINAE on builtins > >>> here, all testing done on AArch64: > >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html > >>> > >>> 1) No change seene in bootstrap & regression test on AArch64 > >>> 2) Manually ran the atomic_float/1.cc testcase with clang++ 19 and > >>> things passed. With clang++ 18 there was a failure independent to > >>> this change where the use of `is_lock_free` in the testcase was not > >>> optimised away so we got a linker error. After editing the testcase > >>> it also passes with clang++ 18. > >>> 3) Manually checked that when compiling with clang we follow the branch > >>> that uses the builtin for `float` (because clang has already > >>> implemented these builtins for `float`). > >>> - Done by adding `+1` to the result of that branch and checking that > >>> we abort when running the result. > >>> 4) Manually checked that when compiling with GCC we follow the branch > >>> that does not use the builtin for `float`. > >>> - Done this by adding the same temporary bug to the header in the > >>> builtin branch, and re-running tests to see that we still pass with > >>> GCC. > >>> > >>> Ok for trunk? > >>> > >>> libstdc++-v3/ChangeLog: > >>> > >>> * include/bits/atomic_base.h > >>> (__atomic_impl::__is_atomic_fetch_add_available): Define new > >>> struct using SFINAE to check whether __atomic_fetch_add is > >>> implemented on floating point type. > >>> (std::__atomic_impl::__fetch_add_flt): `if constexpr` branch > >>> on the above SFINAE test to use __atomic_fetch_add when > >>> available. > >>> (__atomic_impl::__is_atomic_add_fetch_available, > >>> std::__atomic_impl::__add_fetch_flt): Likewise. > >>> (__atomic_impl::__is_atomic_fetch_sub_available, > >>> std::__atomic_impl::__fetch_sub_flt): Likewise. > >>> (__atomic_impl::__is_atomic_sub_fetch_available, > >>> std::__atomic_impl::__sub_fetch_flt): Likewise. > >>> > >>> Signed-off-by: Matthew Malcomson <mmalcom...@nvidia.com> > >>> --- > >>> libstdc++-v3/include/bits/atomic_base.h | 116 ++++++++++++++++++------ > >>> 1 file changed, 90 insertions(+), 26 deletions(-) > >>> > >>> diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/ > >>> include/bits/atomic_base.h > >>> index 72cc4bae6cf..d7671b0e9d2 100644 > >>> --- a/libstdc++-v3/include/bits/atomic_base.h > >>> +++ b/libstdc++-v3/include/bits/atomic_base.h > >>> @@ -1209,54 +1209,118 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >>> __xor_fetch(_Tp* __ptr, _Val<_Tp> __i) noexcept > >>> { return __atomic_xor_fetch(__ptr, __i, __ATOMIC_SEQ_CST); } > >>> > >>> + template <typename _Tp, typename = void> > >>> + struct __is_atomic_fetch_add_available : false_type > >>> + { > >>> + }; > >>> + template <typename T> > >> > >> This T isn't uglified as _Tp, but it doesn't matter because ... > >> > >>> + struct __is_atomic_fetch_add_available< > >>> + T, std::void_t<decltype (__atomic_fetch_add ( > >>> + std::declval<T *> (), std::declval<T> (), int ()))>> : > >>> true_type > >>> + { > >>> + }; > >>> + > >> > >> Atomics for floating-point types are only in C++20, so we can use > >> Concepts for these constraints. > >> > >> template<typename _Tp> > >> concept __atomic_fetch_addable > >> = requires (_Tp __t) { __atomic_fetch_add(&__t, __t, 1); }; > >> > >> I assume this will work fine if the void_t version worked. > >> > >>> template<typename _Tp> > >>> _Tp > >>> __fetch_add_flt(_Tp* __ptr, _Val<_Tp> __i, memory_order __m) > >>> noexcept > >>> { > >>> - _Val<_Tp> __oldval = load(__ptr, memory_order_relaxed); > >>> - _Val<_Tp> __newval = __oldval + __i; > >>> - while (!compare_exchange_weak(__ptr, __oldval, __newval, __m, > >>> - memory_order_relaxed)) > >>> - __newval = __oldval + __i; > >>> - return __oldval; > >>> + if constexpr (__is_atomic_fetch_add_available<_Tp>::value) > >> > >> Then this would be just: > >> > >> if constexpr (__atomic_fetch_addable<_Tp>) > >> > >> i.e. no ::value > >> > >>> + return __atomic_fetch_add (__ptr, __i, int (__m)); > > > > Also, we don't put a space before the opening paren in libstdc++ code. > > That weird GNU-ism isn't used here :-) > > > > I've attached a revised patch with my suggested changes. I've tested > > this with GCC trunk on x86_64-linux. Could you re-run your tests with > > this please? > > > > >