On Tue, Jun 3, 2025 at 6:06 PM Jonathan Wakely <jwak...@redhat.com> wrote:
> On Tue, 3 Jun 2025 at 09:46, Jonathan Wakely wrote: > > > > This adds a new implementation of std::counting_semaphore<Max> for the > > case where Max == 1, i.e. the std::binary_semaphore typedef. When the > > maximum counter value is 1 we don't need to load the current counter > > value before doing a compare-exchange to acquire the semaphore. We can > > just optimisitcally assume it's currently 1, and if that's true then the > > compare_exchange will succeed. This simplifies _M_try_acquire so that we > > don't need the separate _M_do_try_acquire function used by the general > > __semaphore_base implementation for _Max > 1 cases. > > > > We can also use the simpler forms of atomic waiting that just take a > > value instead of a value accessor and predicate, because we know that > > the _M_counter is always a __platform_wait_t. This change adds a > > bare_wait flag to __atomic_wait_address_v because we don't need to track > > waiters for semaphores, we only need to notify when a semaphore with a > > count of zero is released. > > > > I'm not sure if this makes the code any faster in real scenarios, but > > the generated code for std::binary_semaphore is slightly smaller now. > > > > libstdc++-v3/ChangeLog: > > > > * include/bits/semaphore_base.h (__binary_semaphore_impl): New > > class with optimized implementation for std::binary_semaphore. > > (__semaphore_impl) <_max == 1>: Modify alias template to use > > __binary_semaphore_impl. > > * include/bits/atomic_wait.h (__atomic_wait_address_v): Add > > parameter for bare waits. > > --- > > > > Tested x86_64-linux. > > > > libstdc++-v3/include/bits/atomic_wait.h | 5 +- > > libstdc++-v3/include/bits/semaphore_base.h | 73 +++++++++++++++++++++- > > 2 files changed, 73 insertions(+), 5 deletions(-) > > > > diff --git a/libstdc++-v3/include/bits/atomic_wait.h > b/libstdc++-v3/include/bits/atomic_wait.h > > index 815726c16ccb..9ae11191d9ab 100644 > > --- a/libstdc++-v3/include/bits/atomic_wait.h > > +++ b/libstdc++-v3/include/bits/atomic_wait.h > > @@ -249,12 +249,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > // C++26 will return __val > > } > > > > + // Wait on __addr while *__addr == __old is true. > > inline void > > __atomic_wait_address_v(const __detail::__platform_wait_t* __addr, > > __detail::__platform_wait_t __old, > > - int __order) > > + int __order, bool __bare_wait = false) > > { > > - __detail::__wait_args __args{ __addr, __old, __order }; > > + __detail::__wait_args __args{ __addr, __old, __order, __bare_wait }; > > // C++26 will not ignore the return value here > > __detail::__wait_impl(__addr, __args); > > This crashes here on Solaris, this function doesn't work for proxy > waits (i.e. targets without futex). It needs to use > __args._M_setup_wait to get the proxy wait state, or needs the > additional fix pasted below. Patches 3/4 and 4/4 depend on that fix. > Given that we would need a proxy wait, that will cause a load from the proxy variable, is there any benefit from having this partial specialization on such a target? Proxy waits perform atomic wait anyway. The alternative may be to define this specialization whef _GLIBCXX_HAVE_PLATFORM_WAIT > Patches 1/4 (which fixes the deadlock bug) and 2/4 (which is not > essential but IMHO makes the code simpler) are still OK and pass all > tests on Solaris without this (because only 3/4 introduces uses of the > __atomic_wait_address_v overload that doesn't support proxy waits). > > > --- a/libstdc++-v3/include/bits/atomic_wait.h > +++ b/libstdc++-v3/include/bits/atomic_wait.h > @@ -249,14 +249,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > // C++26 will return __val > } > > + // Wait on __addr while *__addr == __old is true. > inline void > __atomic_wait_address_v(const __detail::__platform_wait_t* __addr, > __detail::__platform_wait_t __old, > - int __order) > + int __order, bool __bare_wait = false) > { > - __detail::__wait_args __args{ __addr, __old, __order }; > + // If we have a platform wait function (e.g. futex wait) that > atomically > + // loads *__addr and compares it to __args._M_old then we can use > that. > + // Otherwise we need to use an accessor function to do that load and a > + // predicate function to do the comparison. > +#if _GLIBCXX_HAVE_PLATFORM_WAIT > + __detail::__wait_args __args{ __addr, __old, __order, __bare_wait }; > // C++26 will not ignore the return value here > __detail::__wait_impl(__addr, __args); > +#else > + auto __vfn = [&__addr] { return __atomic_load_n(__addr, __order); }; > + auto __pfn = [&__old](const __detail::__platform_wait_t& __val) { > + return !__detail::__atomic_eq(__old, __val); > + }; > + return std::__atomic_wait_address(__addr, __pfn, __vfn, __bare_wait); > +#endif > } > > // Wait on __addr while __vfn() == __old is true. > >