On Sat, 8 Feb 2025 at 10:55, Matthew Malcomson <mmalcom...@nvidia.com> wrote:
>
> Hi Jonathan!
>
> Many thanks!  Will learn the libstdc++ style eventually.
>
> I've run bootstrap & regression test on this, and did the manual checks
> I mentioned before of compiling atomic_float/1.cc with clang and then
> adding `+ 1` on the builtin codepath to check that the clang binary
> aborts while the GCC built binary passes.

Great, thanks for checking it. I'll get this pushed early next week.


>
> Thanks again!
> Matthew
>
> On 2/7/25 15:33, Jonathan Wakely wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On 05/02/25 13:43 +0000, Jonathan Wakely wrote:
> >> On 28/10/24 17:15 +0000, mmalcom...@nvidia.com wrote:
> >>> From: Matthew Malcomson <mmalcom...@nvidia.com>
> >>>
> >>> I noticed that the libstdc++ patch is essentially separate and figured I
> >>> could send it upstream earlier to give reviewers more time to look at
> >>> it.
> >>> I am still working on adding the ability to use floating point types in
> >>> the __atomic_fetch_add builtins
> >>>
> >>> Review of current state and motivation (for anyone reading this that has
> >>> not already seen the previous patches):
> >>> - Some hardware has support for floating point atomic fetch_add (and
> >>> similar).
> >>> - There are existing compilers targetting this hardware that use
> >>> libstdc++ -- e.g. NVC++.
> >>> - Since the libstdc++ atomic<float>::fetch_add and similar is written
> >>> directly as a CAS loop these compilers can not emit optimal code when
> >>> seeing such constructs.
> >>> - I hope to use __atomic_fetch_add builtins on floating point types
> >>> directly in libstdc++ so these compilers can emit better code.
> >>> - Clang already handles some floating point types in the
> >>> __atomic_fetch_add family of builtins.
> >>> - In order to only use this when available, I originally thought I could
> >>> check against the resolved versions of the builtin in a manner
> >>> something like `__has_builtin(__atomic_fetch_add_<fp-suffix>)`.
> >>> I then realised that clang does not expose resolved versions of these
> >>> atomic builtins to the user.
> >>> From the clang discourse it was suggested we instead use SFINAE (which
> >>> clang already supports).
> >>> - I have posted a patch for allowing the use of SFINAE on builtins (not
> >>> yet reviewed).
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html
> >>> *This* patch builds and regtests on top of that patch.  It does not
> >>> change what happens for GCC, while it uses the builtin for codegen with
> >>> clang.
> >>> - I have previously sent a patchset upstream adding the ability to use
> >>> __atomic_fetch_add and similar on floating point types.
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663355.html
> >>> I hope to send a full patchset up soon including the suggestions given
> >>> there.
> >>> With that patchset included (plus the automatic linking of libatomic
> >>> as Joseph pointed out in the email below
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665408.html )
> >>> then current GCC should start to use the builtin branch added in this
> >>> patch.
> >>>
> >>> So *currently*, this patch allows external compilers (NVC++ in
> >>> particular) to generate better code, and similarly lets clang understand
> >>> the operation better since it maps to a known builtin.
> >>>
> >>> I hope that by GCC 15 this patch would also allow GCC to understand the
> >>> operation better via mapping to a known builtin.
> >>>
> >>> ----------------- 8< ----------- >8 ----------------
> >>>
> >>> Points to question here are:
> >>> 1) Is this the best approach for using SFINAE to check if this builtin
> >>>  has a particular overload?
> >>>  Don't know of a better approach, but not an expert in C++ templating.
> >>
> >> Concepts!
> >>
> >>> We still need the CAS loop fallback for any compiler that doesn't
> >>> implement this builtin.  Once all compilers we care about implement this
> >>> we can remove this special handling and merge the floating point and
> >>> integral operations into the same template.
> >>>
> >>> Testing done:
> >>> N.b. all testing done on top of the patch introducing SFINAE on builtins
> >>> here, all testing done on AArch64:
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html
> >>>
> >>> 1) No change seene in bootstrap & regression test on AArch64
> >>> 2) Manually ran the atomic_float/1.cc testcase with clang++ 19 and
> >>>  things passed.  With clang++ 18 there was a failure independent to
> >>>  this change where the use of `is_lock_free` in the testcase was not
> >>>  optimised away so we got a linker error.  After editing the testcase
> >>>  it also passes with clang++ 18.
> >>> 3) Manually checked that when compiling with clang we follow the branch
> >>>  that uses the builtin for `float` (because clang has already
> >>>  implemented these builtins for `float`).
> >>>  - Done by adding `+1` to the result of that branch and checking that
> >>>    we abort when running the result.
> >>> 4) Manually checked that when compiling with GCC we follow the branch
> >>>  that does not use the builtin for `float`.
> >>>  - Done this by adding the same temporary bug to the header in the
> >>>    builtin branch, and re-running tests to see that we still pass with
> >>>    GCC.
> >>>
> >>> Ok for trunk?
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>>      * include/bits/atomic_base.h
> >>>         (__atomic_impl::__is_atomic_fetch_add_available): Define new
> >>>         struct using SFINAE to check whether __atomic_fetch_add is
> >>>         implemented on floating point type.
> >>>         (std::__atomic_impl::__fetch_add_flt): `if constexpr` branch
> >>>         on the above SFINAE test to use __atomic_fetch_add when
> >>>         available.
> >>>         (__atomic_impl::__is_atomic_add_fetch_available,
> >>>         std::__atomic_impl::__add_fetch_flt): Likewise.
> >>>         (__atomic_impl::__is_atomic_fetch_sub_available,
> >>>         std::__atomic_impl::__fetch_sub_flt): Likewise.
> >>>         (__atomic_impl::__is_atomic_sub_fetch_available,
> >>>         std::__atomic_impl::__sub_fetch_flt): Likewise.
> >>>
> >>> Signed-off-by: Matthew Malcomson <mmalcom...@nvidia.com>
> >>> ---
> >>> libstdc++-v3/include/bits/atomic_base.h | 116 ++++++++++++++++++------
> >>> 1 file changed, 90 insertions(+), 26 deletions(-)
> >>>
> >>> diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/
> >>> include/bits/atomic_base.h
> >>> index 72cc4bae6cf..d7671b0e9d2 100644
> >>> --- a/libstdc++-v3/include/bits/atomic_base.h
> >>> +++ b/libstdc++-v3/include/bits/atomic_base.h
> >>> @@ -1209,54 +1209,118 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>>      __xor_fetch(_Tp* __ptr, _Val<_Tp> __i) noexcept
> >>>      { return __atomic_xor_fetch(__ptr, __i, __ATOMIC_SEQ_CST); }
> >>>
> >>> +      template <typename _Tp, typename = void>
> >>> +      struct __is_atomic_fetch_add_available : false_type
> >>> +      {
> >>> +      };
> >>> +      template <typename T>
> >>
> >> This T isn't uglified as _Tp, but it doesn't matter because ...
> >>
> >>> +      struct __is_atomic_fetch_add_available<
> >>> +     T, std::void_t<decltype (__atomic_fetch_add (
> >>> +          std::declval<T *> (), std::declval<T> (), int ()))>> :
> >>> true_type
> >>> +      {
> >>> +      };
> >>> +
> >>
> >> Atomics for floating-point types are only in C++20, so we can use
> >> Concepts for these constraints.
> >>
> >> template<typename _Tp>
> >>  concept __atomic_fetch_addable
> >>    = requires (_Tp __t) { __atomic_fetch_add(&__t, __t, 1); };
> >>
> >> I assume this will work fine if the void_t version worked.
> >>
> >>>    template<typename _Tp>
> >>>      _Tp
> >>>      __fetch_add_flt(_Tp* __ptr, _Val<_Tp> __i, memory_order __m)
> >>> noexcept
> >>>      {
> >>> -     _Val<_Tp> __oldval = load(__ptr, memory_order_relaxed);
> >>> -     _Val<_Tp> __newval = __oldval + __i;
> >>> -     while (!compare_exchange_weak(__ptr, __oldval, __newval, __m,
> >>> -                                   memory_order_relaxed))
> >>> -       __newval = __oldval + __i;
> >>> -     return __oldval;
> >>> +     if constexpr (__is_atomic_fetch_add_available<_Tp>::value)
> >>
> >> Then this would be just:
> >>
> >>       if constexpr (__atomic_fetch_addable<_Tp>)
> >>
> >> i.e. no ::value
> >>
> >>> +       return __atomic_fetch_add (__ptr, __i, int (__m));
> >
> > Also, we don't put a space before the opening paren in libstdc++ code.
> > That weird GNU-ism isn't used here :-)
> >
> > I've attached a revised patch with my suggested changes. I've tested
> > this with GCC trunk on x86_64-linux. Could you re-run your tests with
> > this please?
> >
> >
>

Reply via email to