Hi Jonathan!

Many thanks!  Will learn the libstdc++ style eventually.

I've run bootstrap & regression test on this, and did the manual checks I mentioned before of compiling atomic_float/1.cc with clang and then adding `+ 1` on the builtin codepath to check that the clang binary aborts while the GCC built binary passes.

Thanks again!
Matthew

On 2/7/25 15:33, Jonathan Wakely wrote:
External email: Use caution opening links or attachments


On 05/02/25 13:43 +0000, Jonathan Wakely wrote:
On 28/10/24 17:15 +0000, mmalcom...@nvidia.com wrote:
From: Matthew Malcomson <mmalcom...@nvidia.com>

I noticed that the libstdc++ patch is essentially separate and figured I
could send it upstream earlier to give reviewers more time to look at
it.
I am still working on adding the ability to use floating point types in
the __atomic_fetch_add builtins

Review of current state and motivation (for anyone reading this that has
not already seen the previous patches):
- Some hardware has support for floating point atomic fetch_add (and
similar).
- There are existing compilers targetting this hardware that use
libstdc++ -- e.g. NVC++.
- Since the libstdc++ atomic<float>::fetch_add and similar is written
directly as a CAS loop these compilers can not emit optimal code when
seeing such constructs.
- I hope to use __atomic_fetch_add builtins on floating point types
directly in libstdc++ so these compilers can emit better code.
- Clang already handles some floating point types in the
__atomic_fetch_add family of builtins.
- In order to only use this when available, I originally thought I could
check against the resolved versions of the builtin in a manner
something like `__has_builtin(__atomic_fetch_add_<fp-suffix>)`.
I then realised that clang does not expose resolved versions of these
atomic builtins to the user.
From the clang discourse it was suggested we instead use SFINAE (which
clang already supports).
- I have posted a patch for allowing the use of SFINAE on builtins (not
yet reviewed).
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html
*This* patch builds and regtests on top of that patch.  It does not
change what happens for GCC, while it uses the builtin for codegen with
clang.
- I have previously sent a patchset upstream adding the ability to use
__atomic_fetch_add and similar on floating point types.
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663355.html
I hope to send a full patchset up soon including the suggestions given
there.
With that patchset included (plus the automatic linking of libatomic
as Joseph pointed out in the email below
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665408.html )
then current GCC should start to use the builtin branch added in this
patch.

So *currently*, this patch allows external compilers (NVC++ in
particular) to generate better code, and similarly lets clang understand
the operation better since it maps to a known builtin.

I hope that by GCC 15 this patch would also allow GCC to understand the
operation better via mapping to a known builtin.

----------------- 8< ----------- >8 ----------------

Points to question here are:
1) Is this the best approach for using SFINAE to check if this builtin
 has a particular overload?
 Don't know of a better approach, but not an expert in C++ templating.

Concepts!

We still need the CAS loop fallback for any compiler that doesn't
implement this builtin.  Once all compilers we care about implement this
we can remove this special handling and merge the floating point and
integral operations into the same template.

Testing done:
N.b. all testing done on top of the patch introducing SFINAE on builtins
here, all testing done on AArch64:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html

1) No change seene in bootstrap & regression test on AArch64
2) Manually ran the atomic_float/1.cc testcase with clang++ 19 and
 things passed.  With clang++ 18 there was a failure independent to
 this change where the use of `is_lock_free` in the testcase was not
 optimised away so we got a linker error.  After editing the testcase
 it also passes with clang++ 18.
3) Manually checked that when compiling with clang we follow the branch
 that uses the builtin for `float` (because clang has already
 implemented these builtins for `float`).
 - Done by adding `+1` to the result of that branch and checking that
   we abort when running the result.
4) Manually checked that when compiling with GCC we follow the branch
 that does not use the builtin for `float`.
 - Done this by adding the same temporary bug to the header in the
   builtin branch, and re-running tests to see that we still pass with
   GCC.

Ok for trunk?

libstdc++-v3/ChangeLog:

     * include/bits/atomic_base.h
        (__atomic_impl::__is_atomic_fetch_add_available): Define new
        struct using SFINAE to check whether __atomic_fetch_add is
        implemented on floating point type.
        (std::__atomic_impl::__fetch_add_flt): `if constexpr` branch
        on the above SFINAE test to use __atomic_fetch_add when
        available.
        (__atomic_impl::__is_atomic_add_fetch_available,
        std::__atomic_impl::__add_fetch_flt): Likewise.
        (__atomic_impl::__is_atomic_fetch_sub_available,
        std::__atomic_impl::__fetch_sub_flt): Likewise.
        (__atomic_impl::__is_atomic_sub_fetch_available,
        std::__atomic_impl::__sub_fetch_flt): Likewise.

Signed-off-by: Matthew Malcomson <mmalcom...@nvidia.com>
---
libstdc++-v3/include/bits/atomic_base.h | 116 ++++++++++++++++++------
1 file changed, 90 insertions(+), 26 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/ include/bits/atomic_base.h
index 72cc4bae6cf..d7671b0e9d2 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -1209,54 +1209,118 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     __xor_fetch(_Tp* __ptr, _Val<_Tp> __i) noexcept
     { return __atomic_xor_fetch(__ptr, __i, __ATOMIC_SEQ_CST); }

+      template <typename _Tp, typename = void>
+      struct __is_atomic_fetch_add_available : false_type
+      {
+      };
+      template <typename T>

This T isn't uglified as _Tp, but it doesn't matter because ...

+      struct __is_atomic_fetch_add_available<
+     T, std::void_t<decltype (__atomic_fetch_add (
+          std::declval<T *> (), std::declval<T> (), int ()))>> : true_type
+      {
+      };
+

Atomics for floating-point types are only in C++20, so we can use
Concepts for these constraints.

template<typename _Tp>
 concept __atomic_fetch_addable
   = requires (_Tp __t) { __atomic_fetch_add(&__t, __t, 1); };

I assume this will work fine if the void_t version worked.

   template<typename _Tp>
     _Tp
     __fetch_add_flt(_Tp* __ptr, _Val<_Tp> __i, memory_order __m) noexcept
     {
-     _Val<_Tp> __oldval = load(__ptr, memory_order_relaxed);
-     _Val<_Tp> __newval = __oldval + __i;
-     while (!compare_exchange_weak(__ptr, __oldval, __newval, __m,
-                                   memory_order_relaxed))
-       __newval = __oldval + __i;
-     return __oldval;
+     if constexpr (__is_atomic_fetch_add_available<_Tp>::value)

Then this would be just:

      if constexpr (__atomic_fetch_addable<_Tp>)

i.e. no ::value

+       return __atomic_fetch_add (__ptr, __i, int (__m));

Also, we don't put a space before the opening paren in libstdc++ code.
That weird GNU-ism isn't used here :-)

I've attached a revised patch with my suggested changes. I've tested
this with GCC trunk on x86_64-linux. Could you re-run your tests with
this please?



Reply via email to