Re: [PATCH 3/4] libstdc++: Optimize std::binary_semaphore

Tomasz Kaminski Wed, 04 Jun 2025 05:45:09 -0700

On Tue, Jun 3, 2025 at 6:06 PM Jonathan Wakely <jwak...@redhat.com> wrote:


> On Tue, 3 Jun 2025 at 09:46, Jonathan Wakely wrote:
> >
> > This adds a new implementation of std::counting_semaphore<Max> for the
> > case where Max == 1, i.e. the std::binary_semaphore typedef. When the
> > maximum counter value is 1 we don't need to load the current counter
> > value before doing a compare-exchange to acquire the semaphore. We can
> > just optimisitcally assume it's currently 1, and if that's true then the
> > compare_exchange will succeed. This simplifies _M_try_acquire so that we
> > don't need the separate _M_do_try_acquire function used by the general
> > __semaphore_base implementation for _Max > 1 cases.
> >
> > We can also use the simpler forms of atomic waiting that just take a
> > value instead of a value accessor and predicate, because we know that
> > the _M_counter is always a __platform_wait_t. This change adds a
> > bare_wait flag to __atomic_wait_address_v because we don't need to track
> > waiters for semaphores, we only need to notify when a semaphore with a
> > count of zero is released.
> >
> > I'm not sure if this makes the code any faster in real scenarios, but
> > the generated code for std::binary_semaphore is slightly smaller now.
> >
> > libstdc++-v3/ChangeLog:
> >
> >         * include/bits/semaphore_base.h (__binary_semaphore_impl): New
> >         class with optimized implementation for std::binary_semaphore.
> >         (__semaphore_impl) <_max == 1>: Modify alias template to use
> >         __binary_semaphore_impl.
> >         * include/bits/atomic_wait.h (__atomic_wait_address_v): Add
> >         parameter for bare waits.
> > ---
> >
> > Tested x86_64-linux.
> >
> >  libstdc++-v3/include/bits/atomic_wait.h    |  5 +-
> >  libstdc++-v3/include/bits/semaphore_base.h | 73 +++++++++++++++++++++-
> >  2 files changed, 73 insertions(+), 5 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/atomic_wait.h
> b/libstdc++-v3/include/bits/atomic_wait.h
> > index 815726c16ccb..9ae11191d9ab 100644
> > --- a/libstdc++-v3/include/bits/atomic_wait.h
> > +++ b/libstdc++-v3/include/bits/atomic_wait.h
> > @@ -249,12 +249,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >        // C++26 will return __val
> >      }
> >
> > +  // Wait on __addr while *__addr == __old is true.
> >    inline void
> >    __atomic_wait_address_v(const __detail::__platform_wait_t* __addr,
> >                           __detail::__platform_wait_t __old,
> > -                         int __order)
> > +                         int __order, bool __bare_wait = false)
> >    {
> > -    __detail::__wait_args __args{ __addr, __old, __order };
> > +    __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
> >      // C++26 will not ignore the return value here
> >      __detail::__wait_impl(__addr, __args);
>
> This crashes here on Solaris, this function doesn't work for proxy
> waits (i.e. targets without futex). It needs to use
> __args._M_setup_wait to get the proxy wait state, or needs the
> additional fix pasted below. Patches 3/4 and 4/4 depend on that fix.
>
Given that we would need a proxy wait, that will cause a load from the
proxy variable,
is there any benefit from having this partial specialization on such a
target?
Proxy waits perform atomic wait anyway. The alternative may be to define
this specialization
whef _GLIBCXX_HAVE_PLATFORM_WAIT


> Patches 1/4 (which fixes the deadlock bug) and 2/4 (which is not
> essential but IMHO makes the code simpler) are still OK and pass all
> tests on Solaris without this (because only 3/4 introduces uses of the
> __atomic_wait_address_v overload that doesn't support proxy waits).
>
>
> --- a/libstdc++-v3/include/bits/atomic_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_wait.h
> @@ -249,14 +249,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>       // C++26 will return __val
>     }
>
> +  // Wait on __addr while *__addr == __old is true.
>   inline void
>   __atomic_wait_address_v(const __detail::__platform_wait_t* __addr,
>                          __detail::__platform_wait_t __old,
> -                         int __order)
> +                         int __order, bool __bare_wait = false)
>   {
> -    __detail::__wait_args __args{ __addr, __old, __order };
> +    // If we have a platform wait function (e.g. futex wait) that
> atomically
> +    // loads *__addr and compares it to __args._M_old then we can use
> that.
> +    // Otherwise we need to use an accessor function to do that load and a
> +    // predicate function to do the comparison.
> +#if _GLIBCXX_HAVE_PLATFORM_WAIT
> +    __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
>     // C++26 will not ignore the return value here
>     __detail::__wait_impl(__addr, __args);
> +#else
> +    auto __vfn = [&__addr] { return __atomic_load_n(__addr, __order); };
> +    auto __pfn = [&__old](const __detail::__platform_wait_t& __val) {
> +      return !__detail::__atomic_eq(__old, __val);
> +    };
> +    return std::__atomic_wait_address(__addr, __pfn, __vfn, __bare_wait);
> +#endif
>   }
>
>   // Wait on __addr while __vfn() == __old is true.
>
>

Re: [PATCH 3/4] libstdc++: Optimize std::binary_semaphore

Reply via email to