On Wed, 12 Feb 2025 at 12:05, Matthew Malcomson <mmalcom...@nvidia.com> wrote: > > Hi there -- since you've not pushed quite yet can I ask that you use the > following commit message instead of the existing one? > (Updated a few comments to match what has changed since I wrote that > message). > > Apologies for not noticing it earlier.
Sure, I'll update it before I push. > > > ---- > libstdc++: Conditionally use floating-point fetch_add builtins > > - Some hardware has support for floating point atomic fetch_add (and > similar). > - There are existing compilers targetting this hardware that use > libstdc++ -- e.g. NVC++. > - Since the libstdc++ atomic<float>::fetch_add and similar is written > directly as a CAS loop these compilers can not emit optimal code when > seeing such constructs. > - I hope to use __atomic_fetch_add builtins on floating point types > directly in libstdc++ so these compilers can emit better code. > - Clang already handles some floating point types in the > __atomic_fetch_add family of builtins. > - In order to only use this when available, I originally thought I could > check against the resolved versions of the builtin in a manner > something like `__has_builtin(__atomic_fetch_add_<fp-suffix>)`. > I then realised that clang does not expose resolved versions of these > atomic builtins to the user. > From the clang discourse it was suggested we instead use SFINAE (which > clang already supports). > - I have recently pushed a patch for allowing the use of SFINAE on > builtins. > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html > Now that patch is upstream, this patch does not change what happens > for GCC, while it uses the builtin for codegen with clang. > - I have previously sent a patchset upstream adding the ability to use > __atomic_fetch_add and similar on floating point types. > https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668754.html > Once that patchset is upstream (plus the automatic linking of > libatomic as Joseph pointed out in the email below > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665408.html ) > then current GCC should start to use the builtin branch added in this > patch. > > So *currently*, this patch allows external compilers (NVC++ in > particular) to generate better code, and similarly lets clang understand > the operation better since it maps to a known builtin. > > I hope that by GCC 16 this patch would also allow GCC to understand the > operation better via mapping to a known builtin. > > Testing done: > 1) No change seen in bootstrap & regression test on AArch64 > 2) No change seen in bootstrap & regression test on x86_64 (Jonathan did > this test). > 3) Manually checked that when compiling with clang we follow the branch > that uses the builtin for `float` (because clang has already > implemented these builtins for `float`). > - Done by adding `+1` to the result of the builtin branch and > checking that we abort when running the result for each of > fetch_add/fetch_sub/add_fetch/sub_fetch in turn (i.e. 4 separate > manual checks). > 4) Manually checked that when compiling with GCC we follow the branch > that does not use the builtin for `float`. > - Done this by adding the same temporary bug to the header in the > builtin branch, and re-running tests to see that we still pass with > GCC. > > libstdc++-v3/ChangeLog: > > * include/bits/atomic_base.h (__atomic_fetch_addable): Define > new concept. > (__atomic_impl::__fetch_add_flt): Use new concept to make use of > __atomic_fetch_add when available. > (__atomic_fetch_subtractable, __fetch_sub_flt): Likewise. > (__atomic_add_fetchable, __add_fetch_flt): Likewise. > (__atomic_sub_fetchable, __sub_fetch_flt): Likewise. > > Signed-off-by: Matthew Malcomson <mmalcom...@nvidia.com> > Co-authored-by: Jonathan Wakely <jwak...@redhat.com> > > --- > > > On 2/8/25 14:05, Jonathan Wakely wrote: > > External email: Use caution opening links or attachments > > > > > > On Sat, 8 Feb 2025 at 10:55, Matthew Malcomson <mmalcom...@nvidia.com> > > wrote: > >> > >> Hi Jonathan! > >> > >> Many thanks! Will learn the libstdc++ style eventually. > >> > >> I've run bootstrap & regression test on this, and did the manual checks > >> I mentioned before of compiling atomic_float/1.cc with clang and then > >> adding `+ 1` on the builtin codepath to check that the clang binary > >> aborts while the GCC built binary passes. > > > > Great, thanks for checking it. I'll get this pushed early next week. > > > > > >> > >> Thanks again! > >> Matthew > >> > >> On 2/7/25 15:33, Jonathan Wakely wrote: > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On 05/02/25 13:43 +0000, Jonathan Wakely wrote: > >>>> On 28/10/24 17:15 +0000, mmalcom...@nvidia.com wrote: > >>>>> From: Matthew Malcomson <mmalcom...@nvidia.com> > >>>>> > >>>>> I noticed that the libstdc++ patch is essentially separate and figured I > >>>>> could send it upstream earlier to give reviewers more time to look at > >>>>> it. > >>>>> I am still working on adding the ability to use floating point types in > >>>>> the __atomic_fetch_add builtins > >>>>> > >>>>> Review of current state and motivation (for anyone reading this that has > >>>>> not already seen the previous patches): > >>>>> - Some hardware has support for floating point atomic fetch_add (and > >>>>> similar). > >>>>> - There are existing compilers targetting this hardware that use > >>>>> libstdc++ -- e.g. NVC++. > >>>>> - Since the libstdc++ atomic<float>::fetch_add and similar is written > >>>>> directly as a CAS loop these compilers can not emit optimal code when > >>>>> seeing such constructs. > >>>>> - I hope to use __atomic_fetch_add builtins on floating point types > >>>>> directly in libstdc++ so these compilers can emit better code. > >>>>> - Clang already handles some floating point types in the > >>>>> __atomic_fetch_add family of builtins. > >>>>> - In order to only use this when available, I originally thought I could > >>>>> check against the resolved versions of the builtin in a manner > >>>>> something like `__has_builtin(__atomic_fetch_add_<fp-suffix>)`. > >>>>> I then realised that clang does not expose resolved versions of these > >>>>> atomic builtins to the user. > >>>>> From the clang discourse it was suggested we instead use SFINAE (which > >>>>> clang already supports). > >>>>> - I have posted a patch for allowing the use of SFINAE on builtins (not > >>>>> yet reviewed). > >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html > >>>>> *This* patch builds and regtests on top of that patch. It does not > >>>>> change what happens for GCC, while it uses the builtin for codegen with > >>>>> clang. > >>>>> - I have previously sent a patchset upstream adding the ability to use > >>>>> __atomic_fetch_add and similar on floating point types. > >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663355.html > >>>>> I hope to send a full patchset up soon including the suggestions given > >>>>> there. > >>>>> With that patchset included (plus the automatic linking of libatomic > >>>>> as Joseph pointed out in the email below > >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665408.html ) > >>>>> then current GCC should start to use the builtin branch added in this > >>>>> patch. > >>>>> > >>>>> So *currently*, this patch allows external compilers (NVC++ in > >>>>> particular) to generate better code, and similarly lets clang understand > >>>>> the operation better since it maps to a known builtin. > >>>>> > >>>>> I hope that by GCC 15 this patch would also allow GCC to understand the > >>>>> operation better via mapping to a known builtin. > >>>>> > >>>>> ----------------- 8< ----------- >8 ---------------- > >>>>> > >>>>> Points to question here are: > >>>>> 1) Is this the best approach for using SFINAE to check if this builtin > >>>>> has a particular overload? > >>>>> Don't know of a better approach, but not an expert in C++ templating. > >>>> > >>>> Concepts! > >>>> > >>>>> We still need the CAS loop fallback for any compiler that doesn't > >>>>> implement this builtin. Once all compilers we care about implement this > >>>>> we can remove this special handling and merge the floating point and > >>>>> integral operations into the same template. > >>>>> > >>>>> Testing done: > >>>>> N.b. all testing done on top of the patch introducing SFINAE on builtins > >>>>> here, all testing done on AArch64: > >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html > >>>>> > >>>>> 1) No change seene in bootstrap & regression test on AArch64 > >>>>> 2) Manually ran the atomic_float/1.cc testcase with clang++ 19 and > >>>>> things passed. With clang++ 18 there was a failure independent to > >>>>> this change where the use of `is_lock_free` in the testcase was not > >>>>> optimised away so we got a linker error. After editing the testcase > >>>>> it also passes with clang++ 18. > >>>>> 3) Manually checked that when compiling with clang we follow the branch > >>>>> that uses the builtin for `float` (because clang has already > >>>>> implemented these builtins for `float`). > >>>>> - Done by adding `+1` to the result of that branch and checking that > >>>>> we abort when running the result. > >>>>> 4) Manually checked that when compiling with GCC we follow the branch > >>>>> that does not use the builtin for `float`. > >>>>> - Done this by adding the same temporary bug to the header in the > >>>>> builtin branch, and re-running tests to see that we still pass with > >>>>> GCC. > >>>>> > >>>>> Ok for trunk? > >>>>> > >>>>> libstdc++-v3/ChangeLog: > >>>>> > >>>>> * include/bits/atomic_base.h > >>>>> (__atomic_impl::__is_atomic_fetch_add_available): Define new > >>>>> struct using SFINAE to check whether __atomic_fetch_add is > >>>>> implemented on floating point type. > >>>>> (std::__atomic_impl::__fetch_add_flt): `if constexpr` branch > >>>>> on the above SFINAE test to use __atomic_fetch_add when > >>>>> available. > >>>>> (__atomic_impl::__is_atomic_add_fetch_available, > >>>>> std::__atomic_impl::__add_fetch_flt): Likewise. > >>>>> (__atomic_impl::__is_atomic_fetch_sub_available, > >>>>> std::__atomic_impl::__fetch_sub_flt): Likewise. > >>>>> (__atomic_impl::__is_atomic_sub_fetch_available, > >>>>> std::__atomic_impl::__sub_fetch_flt): Likewise. > >>>>> > >>>>> Signed-off-by: Matthew Malcomson <mmalcom...@nvidia.com> > >>>>> --- > >>>>> libstdc++-v3/include/bits/atomic_base.h | 116 ++++++++++++++++++------ > >>>>> 1 file changed, 90 insertions(+), 26 deletions(-) > >>>>> > >>>>> diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/ > >>>>> include/bits/atomic_base.h > >>>>> index 72cc4bae6cf..d7671b0e9d2 100644 > >>>>> --- a/libstdc++-v3/include/bits/atomic_base.h > >>>>> +++ b/libstdc++-v3/include/bits/atomic_base.h > >>>>> @@ -1209,54 +1209,118 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >>>>> __xor_fetch(_Tp* __ptr, _Val<_Tp> __i) noexcept > >>>>> { return __atomic_xor_fetch(__ptr, __i, __ATOMIC_SEQ_CST); } > >>>>> > >>>>> + template <typename _Tp, typename = void> > >>>>> + struct __is_atomic_fetch_add_available : false_type > >>>>> + { > >>>>> + }; > >>>>> + template <typename T> > >>>> > >>>> This T isn't uglified as _Tp, but it doesn't matter because ... > >>>> > >>>>> + struct __is_atomic_fetch_add_available< > >>>>> + T, std::void_t<decltype (__atomic_fetch_add ( > >>>>> + std::declval<T *> (), std::declval<T> (), int ()))>> : > >>>>> true_type > >>>>> + { > >>>>> + }; > >>>>> + > >>>> > >>>> Atomics for floating-point types are only in C++20, so we can use > >>>> Concepts for these constraints. > >>>> > >>>> template<typename _Tp> > >>>> concept __atomic_fetch_addable > >>>> = requires (_Tp __t) { __atomic_fetch_add(&__t, __t, 1); }; > >>>> > >>>> I assume this will work fine if the void_t version worked. > >>>> > >>>>> template<typename _Tp> > >>>>> _Tp > >>>>> __fetch_add_flt(_Tp* __ptr, _Val<_Tp> __i, memory_order __m) > >>>>> noexcept > >>>>> { > >>>>> - _Val<_Tp> __oldval = load(__ptr, memory_order_relaxed); > >>>>> - _Val<_Tp> __newval = __oldval + __i; > >>>>> - while (!compare_exchange_weak(__ptr, __oldval, __newval, __m, > >>>>> - memory_order_relaxed)) > >>>>> - __newval = __oldval + __i; > >>>>> - return __oldval; > >>>>> + if constexpr (__is_atomic_fetch_add_available<_Tp>::value) > >>>> > >>>> Then this would be just: > >>>> > >>>> if constexpr (__atomic_fetch_addable<_Tp>) > >>>> > >>>> i.e. no ::value > >>>> > >>>>> + return __atomic_fetch_add (__ptr, __i, int (__m)); > >>> > >>> Also, we don't put a space before the opening paren in libstdc++ code. > >>> That weird GNU-ism isn't used here :-) > >>> > >>> I've attached a revised patch with my suggested changes. I've tested > >>> this with GCC trunk on x86_64-linux. Could you re-run your tests with > >>> this please? > >>> > >>> > >> > > >