On Wed, 12 Feb 2025 at 12:05, Matthew Malcomson <mmalcom...@nvidia.com> wrote:
>
> Hi there -- since you've not pushed quite yet can I ask that you use the
> following commit message instead of the existing one?
> (Updated a few comments to match what has changed since I wrote that
> message).
>
> Apologies for not noticing it earlier.

Sure, I'll update it before I push.


>
>
> ----
> libstdc++: Conditionally use floating-point fetch_add builtins
>
> - Some hardware has support for floating point atomic fetch_add (and
>    similar).
> - There are existing compilers targetting this hardware that use
>    libstdc++ -- e.g. NVC++.
> - Since the libstdc++ atomic<float>::fetch_add and similar is written
>    directly as a CAS loop these compilers can not emit optimal code when
>    seeing such constructs.
> - I hope to use __atomic_fetch_add builtins on floating point types
>    directly in libstdc++ so these compilers can emit better code.
> - Clang already handles some floating point types in the
>    __atomic_fetch_add family of builtins.
> - In order to only use this when available, I originally thought I could
>    check against the resolved versions of the builtin in a manner
>    something like `__has_builtin(__atomic_fetch_add_<fp-suffix>)`.
>    I then realised that clang does not expose resolved versions of these
>    atomic builtins to the user.
>    From the clang discourse it was suggested we instead use SFINAE (which
>    clang already supports).
> - I have recently pushed a patch for allowing the use of SFINAE on
>    builtins.
>    https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html
>    Now that patch is upstream, this patch does not change what happens
>    for GCC, while it uses the builtin for codegen with clang.
> - I have previously sent a patchset upstream adding the ability to use
>    __atomic_fetch_add and similar on floating point types.
>    https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668754.html
>    Once that patchset is upstream (plus the automatic linking of
>    libatomic as Joseph pointed out in the email below
>    https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665408.html )
>    then current GCC should start to use the builtin branch added in this
>    patch.
>
> So *currently*, this patch allows external compilers (NVC++ in
> particular) to generate better code, and similarly lets clang understand
> the operation better since it maps to a known builtin.
>
> I hope that by GCC 16 this patch would also allow GCC to understand the
> operation better via mapping to a known builtin.
>
> Testing done:
> 1) No change seen in bootstrap & regression test on AArch64
> 2) No change seen in bootstrap & regression test on x86_64 (Jonathan did
>     this test).
> 3) Manually checked that when compiling with clang we follow the branch
>     that uses the builtin for `float` (because clang has already
>     implemented these builtins for `float`).
>     - Done by adding `+1` to the result of the builtin branch and
>       checking that we abort when running the result for each of
>       fetch_add/fetch_sub/add_fetch/sub_fetch in turn (i.e. 4 separate
>       manual checks).
> 4) Manually checked that when compiling with GCC we follow the branch
>     that does not use the builtin for `float`.
>     - Done this by adding the same temporary bug to the header in the
>       builtin branch, and re-running tests to see that we still pass with
>       GCC.
>
> libstdc++-v3/ChangeLog:
>
>          * include/bits/atomic_base.h (__atomic_fetch_addable): Define
>          new concept.
>          (__atomic_impl::__fetch_add_flt): Use new concept to make use of
>          __atomic_fetch_add when available.
>          (__atomic_fetch_subtractable, __fetch_sub_flt): Likewise.
>          (__atomic_add_fetchable, __add_fetch_flt): Likewise.
>          (__atomic_sub_fetchable, __sub_fetch_flt): Likewise.
>
> Signed-off-by: Matthew Malcomson <mmalcom...@nvidia.com>
> Co-authored-by: Jonathan Wakely <jwak...@redhat.com>
>
> ---
>
>
> On 2/8/25 14:05, Jonathan Wakely wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Sat, 8 Feb 2025 at 10:55, Matthew Malcomson <mmalcom...@nvidia.com> 
> > wrote:
> >>
> >> Hi Jonathan!
> >>
> >> Many thanks!  Will learn the libstdc++ style eventually.
> >>
> >> I've run bootstrap & regression test on this, and did the manual checks
> >> I mentioned before of compiling atomic_float/1.cc with clang and then
> >> adding `+ 1` on the builtin codepath to check that the clang binary
> >> aborts while the GCC built binary passes.
> >
> > Great, thanks for checking it. I'll get this pushed early next week.
> >
> >
> >>
> >> Thanks again!
> >> Matthew
> >>
> >> On 2/7/25 15:33, Jonathan Wakely wrote:
> >>> External email: Use caution opening links or attachments
> >>>
> >>>
> >>> On 05/02/25 13:43 +0000, Jonathan Wakely wrote:
> >>>> On 28/10/24 17:15 +0000, mmalcom...@nvidia.com wrote:
> >>>>> From: Matthew Malcomson <mmalcom...@nvidia.com>
> >>>>>
> >>>>> I noticed that the libstdc++ patch is essentially separate and figured I
> >>>>> could send it upstream earlier to give reviewers more time to look at
> >>>>> it.
> >>>>> I am still working on adding the ability to use floating point types in
> >>>>> the __atomic_fetch_add builtins
> >>>>>
> >>>>> Review of current state and motivation (for anyone reading this that has
> >>>>> not already seen the previous patches):
> >>>>> - Some hardware has support for floating point atomic fetch_add (and
> >>>>> similar).
> >>>>> - There are existing compilers targetting this hardware that use
> >>>>> libstdc++ -- e.g. NVC++.
> >>>>> - Since the libstdc++ atomic<float>::fetch_add and similar is written
> >>>>> directly as a CAS loop these compilers can not emit optimal code when
> >>>>> seeing such constructs.
> >>>>> - I hope to use __atomic_fetch_add builtins on floating point types
> >>>>> directly in libstdc++ so these compilers can emit better code.
> >>>>> - Clang already handles some floating point types in the
> >>>>> __atomic_fetch_add family of builtins.
> >>>>> - In order to only use this when available, I originally thought I could
> >>>>> check against the resolved versions of the builtin in a manner
> >>>>> something like `__has_builtin(__atomic_fetch_add_<fp-suffix>)`.
> >>>>> I then realised that clang does not expose resolved versions of these
> >>>>> atomic builtins to the user.
> >>>>>  From the clang discourse it was suggested we instead use SFINAE (which
> >>>>> clang already supports).
> >>>>> - I have posted a patch for allowing the use of SFINAE on builtins (not
> >>>>> yet reviewed).
> >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html
> >>>>> *This* patch builds and regtests on top of that patch.  It does not
> >>>>> change what happens for GCC, while it uses the builtin for codegen with
> >>>>> clang.
> >>>>> - I have previously sent a patchset upstream adding the ability to use
> >>>>> __atomic_fetch_add and similar on floating point types.
> >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663355.html
> >>>>> I hope to send a full patchset up soon including the suggestions given
> >>>>> there.
> >>>>> With that patchset included (plus the automatic linking of libatomic
> >>>>> as Joseph pointed out in the email below
> >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665408.html )
> >>>>> then current GCC should start to use the builtin branch added in this
> >>>>> patch.
> >>>>>
> >>>>> So *currently*, this patch allows external compilers (NVC++ in
> >>>>> particular) to generate better code, and similarly lets clang understand
> >>>>> the operation better since it maps to a known builtin.
> >>>>>
> >>>>> I hope that by GCC 15 this patch would also allow GCC to understand the
> >>>>> operation better via mapping to a known builtin.
> >>>>>
> >>>>> ----------------- 8< ----------- >8 ----------------
> >>>>>
> >>>>> Points to question here are:
> >>>>> 1) Is this the best approach for using SFINAE to check if this builtin
> >>>>>   has a particular overload?
> >>>>>   Don't know of a better approach, but not an expert in C++ templating.
> >>>>
> >>>> Concepts!
> >>>>
> >>>>> We still need the CAS loop fallback for any compiler that doesn't
> >>>>> implement this builtin.  Once all compilers we care about implement this
> >>>>> we can remove this special handling and merge the floating point and
> >>>>> integral operations into the same template.
> >>>>>
> >>>>> Testing done:
> >>>>> N.b. all testing done on top of the patch introducing SFINAE on builtins
> >>>>> here, all testing done on AArch64:
> >>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664999.html
> >>>>>
> >>>>> 1) No change seene in bootstrap & regression test on AArch64
> >>>>> 2) Manually ran the atomic_float/1.cc testcase with clang++ 19 and
> >>>>>   things passed.  With clang++ 18 there was a failure independent to
> >>>>>   this change where the use of `is_lock_free` in the testcase was not
> >>>>>   optimised away so we got a linker error.  After editing the testcase
> >>>>>   it also passes with clang++ 18.
> >>>>> 3) Manually checked that when compiling with clang we follow the branch
> >>>>>   that uses the builtin for `float` (because clang has already
> >>>>>   implemented these builtins for `float`).
> >>>>>   - Done by adding `+1` to the result of that branch and checking that
> >>>>>     we abort when running the result.
> >>>>> 4) Manually checked that when compiling with GCC we follow the branch
> >>>>>   that does not use the builtin for `float`.
> >>>>>   - Done this by adding the same temporary bug to the header in the
> >>>>>     builtin branch, and re-running tests to see that we still pass with
> >>>>>     GCC.
> >>>>>
> >>>>> Ok for trunk?
> >>>>>
> >>>>> libstdc++-v3/ChangeLog:
> >>>>>
> >>>>>       * include/bits/atomic_base.h
> >>>>>          (__atomic_impl::__is_atomic_fetch_add_available): Define new
> >>>>>          struct using SFINAE to check whether __atomic_fetch_add is
> >>>>>          implemented on floating point type.
> >>>>>          (std::__atomic_impl::__fetch_add_flt): `if constexpr` branch
> >>>>>          on the above SFINAE test to use __atomic_fetch_add when
> >>>>>          available.
> >>>>>          (__atomic_impl::__is_atomic_add_fetch_available,
> >>>>>          std::__atomic_impl::__add_fetch_flt): Likewise.
> >>>>>          (__atomic_impl::__is_atomic_fetch_sub_available,
> >>>>>          std::__atomic_impl::__fetch_sub_flt): Likewise.
> >>>>>          (__atomic_impl::__is_atomic_sub_fetch_available,
> >>>>>          std::__atomic_impl::__sub_fetch_flt): Likewise.
> >>>>>
> >>>>> Signed-off-by: Matthew Malcomson <mmalcom...@nvidia.com>
> >>>>> ---
> >>>>> libstdc++-v3/include/bits/atomic_base.h | 116 ++++++++++++++++++------
> >>>>> 1 file changed, 90 insertions(+), 26 deletions(-)
> >>>>>
> >>>>> diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/
> >>>>> include/bits/atomic_base.h
> >>>>> index 72cc4bae6cf..d7671b0e9d2 100644
> >>>>> --- a/libstdc++-v3/include/bits/atomic_base.h
> >>>>> +++ b/libstdc++-v3/include/bits/atomic_base.h
> >>>>> @@ -1209,54 +1209,118 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>>>>       __xor_fetch(_Tp* __ptr, _Val<_Tp> __i) noexcept
> >>>>>       { return __atomic_xor_fetch(__ptr, __i, __ATOMIC_SEQ_CST); }
> >>>>>
> >>>>> +      template <typename _Tp, typename = void>
> >>>>> +      struct __is_atomic_fetch_add_available : false_type
> >>>>> +      {
> >>>>> +      };
> >>>>> +      template <typename T>
> >>>>
> >>>> This T isn't uglified as _Tp, but it doesn't matter because ...
> >>>>
> >>>>> +      struct __is_atomic_fetch_add_available<
> >>>>> +     T, std::void_t<decltype (__atomic_fetch_add (
> >>>>> +          std::declval<T *> (), std::declval<T> (), int ()))>> :
> >>>>> true_type
> >>>>> +      {
> >>>>> +      };
> >>>>> +
> >>>>
> >>>> Atomics for floating-point types are only in C++20, so we can use
> >>>> Concepts for these constraints.
> >>>>
> >>>> template<typename _Tp>
> >>>>   concept __atomic_fetch_addable
> >>>>     = requires (_Tp __t) { __atomic_fetch_add(&__t, __t, 1); };
> >>>>
> >>>> I assume this will work fine if the void_t version worked.
> >>>>
> >>>>>     template<typename _Tp>
> >>>>>       _Tp
> >>>>>       __fetch_add_flt(_Tp* __ptr, _Val<_Tp> __i, memory_order __m)
> >>>>> noexcept
> >>>>>       {
> >>>>> -     _Val<_Tp> __oldval = load(__ptr, memory_order_relaxed);
> >>>>> -     _Val<_Tp> __newval = __oldval + __i;
> >>>>> -     while (!compare_exchange_weak(__ptr, __oldval, __newval, __m,
> >>>>> -                                   memory_order_relaxed))
> >>>>> -       __newval = __oldval + __i;
> >>>>> -     return __oldval;
> >>>>> +     if constexpr (__is_atomic_fetch_add_available<_Tp>::value)
> >>>>
> >>>> Then this would be just:
> >>>>
> >>>>        if constexpr (__atomic_fetch_addable<_Tp>)
> >>>>
> >>>> i.e. no ::value
> >>>>
> >>>>> +       return __atomic_fetch_add (__ptr, __i, int (__m));
> >>>
> >>> Also, we don't put a space before the opening paren in libstdc++ code.
> >>> That weird GNU-ism isn't used here :-)
> >>>
> >>> I've attached a revised patch with my suggested changes. I've tested
> >>> this with GCC trunk on x86_64-linux. Could you re-run your tests with
> >>> this please?
> >>>
> >>>
> >>
> >
>

Reply via email to