Re: [PATCH 3/4] libstdc++: Optimize std::binary_semaphore

Jonathan Wakely Wed, 04 Jun 2025 05:57:58 -0700

On Wed, 4 Jun 2025 at 13:44, Tomasz Kaminski <tkami...@redhat.com> wrote:
>
> On Tue, Jun 3, 2025 at 6:06 PM Jonathan Wakely <jwak...@redhat.com> wrote:
>>
>> On Tue, 3 Jun 2025 at 09:46, Jonathan Wakely wrote:
>> >
>> > This adds a new implementation of std::counting_semaphore<Max> for the
>> > case where Max == 1, i.e. the std::binary_semaphore typedef. When the
>> > maximum counter value is 1 we don't need to load the current counter
>> > value before doing a compare-exchange to acquire the semaphore. We can
>> > just optimisitcally assume it's currently 1, and if that's true then the
>> > compare_exchange will succeed. This simplifies _M_try_acquire so that we
>> > don't need the separate _M_do_try_acquire function used by the general
>> > __semaphore_base implementation for _Max > 1 cases.
>> >
>> > We can also use the simpler forms of atomic waiting that just take a
>> > value instead of a value accessor and predicate, because we know that
>> > the _M_counter is always a __platform_wait_t. This change adds a
>> > bare_wait flag to __atomic_wait_address_v because we don't need to track
>> > waiters for semaphores, we only need to notify when a semaphore with a
>> > count of zero is released.
>> >
>> > I'm not sure if this makes the code any faster in real scenarios, but
>> > the generated code for std::binary_semaphore is slightly smaller now.
>> >
>> > libstdc++-v3/ChangeLog:
>> >
>> >         * include/bits/semaphore_base.h (__binary_semaphore_impl): New
>> >         class with optimized implementation for std::binary_semaphore.
>> >         (__semaphore_impl) <_max == 1>: Modify alias template to use
>> >         __binary_semaphore_impl.
>> >         * include/bits/atomic_wait.h (__atomic_wait_address_v): Add
>> >         parameter for bare waits.
>> > ---
>> >
>> > Tested x86_64-linux.
>> >
>> >  libstdc++-v3/include/bits/atomic_wait.h    |  5 +-
>> >  libstdc++-v3/include/bits/semaphore_base.h | 73 +++++++++++++++++++++-
>> >  2 files changed, 73 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
>> > b/libstdc++-v3/include/bits/atomic_wait.h
>> > index 815726c16ccb..9ae11191d9ab 100644
>> > --- a/libstdc++-v3/include/bits/atomic_wait.h
>> > +++ b/libstdc++-v3/include/bits/atomic_wait.h
>> > @@ -249,12 +249,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> >        // C++26 will return __val
>> >      }
>> >
>> > +  // Wait on __addr while *__addr == __old is true.
>> >    inline void
>> >    __atomic_wait_address_v(const __detail::__platform_wait_t* __addr,
>> >                           __detail::__platform_wait_t __old,
>> > -                         int __order)
>> > +                         int __order, bool __bare_wait = false)
>> >    {
>> > -    __detail::__wait_args __args{ __addr, __old, __order };
>> > +    __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
>> >      // C++26 will not ignore the return value here
>> >      __detail::__wait_impl(__addr, __args);
>>
>> This crashes here on Solaris, this function doesn't work for proxy
>> waits (i.e. targets without futex). It needs to use
>> __args._M_setup_wait to get the proxy wait state, or needs the


N.B. getting a proxy wait state isn't sufficient, we also need to
reload *addr after setting __args._M_old = __args._M_state->_M_ver and
then fudge _M_old if *addr has already changed, so that __wait_impl
won't sleep waiting for a change that already happened.

>> additional fix pasted below. Patches 3/4 and 4/4 depend on that fix.
>
> Given that we would need a proxy wait, that will cause a load from the proxy 
> variable,
> is there any benefit from having this partial specialization on such a target?
> Proxy waits perform atomic wait anyway. The alternative may be to define this 
> specialization
> whef _GLIBCXX_HAVE_PLATFORM_WAIT

Yes, I came to the same conclusion earlier this morning. There's no
benefit to the optimized implementations for proxy waits. We should
either check _GLIBCXX_HAVE_PLATFORM_WAIT or the
__platform_wait_uses_type<__count_type> variable template (which is
always false if HAVE_PLATFORM_WAIT is not defined).

If we don't try to use __atomic_wait_address_v when we will do a proxy
wait, then we don't need to "fix" it to work for proxy waits. Instead
we could just add an assertion to the functions that can't be used for
proxy waits, to make sure they aren't used by mistake (as I tried to
do).


>
>>
>> Patches 1/4 (which fixes the deadlock bug) and 2/4 (which is not
>> essential but IMHO makes the code simpler) are still OK and pass all
>> tests on Solaris without this (because only 3/4 introduces uses of the
>> __atomic_wait_address_v overload that doesn't support proxy waits).
>>
>>
>> --- a/libstdc++-v3/include/bits/atomic_wait.h
>> +++ b/libstdc++-v3/include/bits/atomic_wait.h
>> @@ -249,14 +249,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>>       // C++26 will return __val
>>     }
>>
>> +  // Wait on __addr while *__addr == __old is true.
>>   inline void
>>   __atomic_wait_address_v(const __detail::__platform_wait_t* __addr,
>>                          __detail::__platform_wait_t __old,
>> -                         int __order)
>> +                         int __order, bool __bare_wait = false)
>>   {
>> -    __detail::__wait_args __args{ __addr, __old, __order };
>> +    // If we have a platform wait function (e.g. futex wait) that atomically
>> +    // loads *__addr and compares it to __args._M_old then we can use that.
>> +    // Otherwise we need to use an accessor function to do that load and a
>> +    // predicate function to do the comparison.
>> +#if _GLIBCXX_HAVE_PLATFORM_WAIT
>> +    __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
>>     // C++26 will not ignore the return value here
>>     __detail::__wait_impl(__addr, __args);
>> +#else
>> +    auto __vfn = [&__addr] { return __atomic_load_n(__addr, __order); };
>> +    auto __pfn = [&__old](const __detail::__platform_wait_t& __val) {
>> +      return !__detail::__atomic_eq(__old, __val);
>> +    };
>> +    return std::__atomic_wait_address(__addr, __pfn, __vfn, __bare_wait);
>> +#endif
>>   }
>>
>>   // Wait on __addr while __vfn() == __old is true.
>>

Re: [PATCH 3/4] libstdc++: Optimize std::binary_semaphore

Reply via email to