On Tue, Jun 3, 2025 at 10:34 AM Jonathan Wakely <jwak...@redhat.com> wrote:
> There's a deadlock in std::counting_semaphore that occurs when the > semaphore is under contention. The bug happens when one thread tries to > acquire the mutex, calling __semaphore_base::_S_do_try_acquire to > atomically decrement the counter using compare_exchange_strong. If the > counter is non-zero (and so should be possible to decrement) but another > thread changes it (either incrementing or decrementing it) then the > compare_exchange fails and _S_do_try_acquire returns false. Because that > function is used by the predicate passed to __atomic_wait_address, when > it returns false the thread does a futex wait until the value changes. > However, when the predicate is false because the compare_exchange failed > due to not matching the expected value, waiting for the value to change > is incorrect. The correct behaviour would be to retry the > compare_exchange using the new value (as long as it's still non-zero). > Waiting for the value to change again means we can block forever, > because it might never change again. > > The predicate should only test the value, not also attempt to alter it, > and its return value should mean only one thing, not conflate a busy > semaphore that cannot be acquired with a contended one that can be > acquired by retrying. > > The correct behaviour of __semaphore_base::_M_acquire is to attempt the > decrement, but to retry immediately if it failed due to contention on > the variable (i.e. due to the variable not having the expected value). > It should only wait for the value to change when the value is zero, > because that's the only time we can't decrement it. > > This commit moves the _S_do_try_acquire call out of the predicate and > loops while it is false, only doing an atomic wait when the the > counter's value is zero. The predicate used for the atomic wait now only > checks whether the value is decrementable (non-zero), without also > trying to perform that decrement. > > In order for the caller to tell whether to retry a failed > _S_do_try_acquire or wait for the value to be non-zero, the value > obtained by a failed compare_exchange needs to be passed back to the > caller. _S_do_try_acquire is changed to take its parameter by reference, > so that the caller gets the new value and can check whether it's zero. > > In order to avoid doing another atomic load after returning from an > atomic wait, the predicate is also changed to take its parameter by > reference, so that when a non-zero value is detected that value is > available to _M_acquire (and can be passed to _S_do_try_acquire as the > expected value of the compare_exchange). Although this means that the > predicate is modifying data again, not just checking a value, this > modification is safe. It's not changing the semaphore's counter, only > changing a local variable in the caller to avoid a redundant atomic > load. > > Equivalent changes are made to _M_try_acquire_until and > _M_try_acquire_for. They have the same bug, although they can escape the > deadlock if the wait is interrupted by timing out. For _M_acquire > there's no time out so it potentially waits forever. > > _M_try_acquire also has the same bug, but can be simplified to just > calling _M_try_acquire_for with a timeout of zero. That will result in > calling __wait_impl with the __spin_only flag set, so that the value is > loaded and checked in a spin loop but there is no sleep. This means that > _M_try_acquire can still succeed under light contention if the counter > is being changed concurrently, at the cost of a little extra overhead. > It would be possible to implement _M_try_acquire as nothing more than an > atomic load and a compare_exchange, but it would fail when there's is > any contention. > > libstdc++-v3/ChangeLog: > > PR libstdc++/104928 > * include/bits/semaphore_base.h (_S_do_try_acquire): Take old > value by reference. > (_M_acquire): Move _S_do_try_acquire call out of the predicate > and loop on its result. Make the predicate capture and update > the local copy of the value. > (_M_try_acquire_until, _M_try_acquire_for): Likewise. > (_M_try_acquire): Just call _M_try_acquire_for. > * testsuite/30_threads/semaphore/104928-2.cc: New test. > * testsuite/30_threads/semaphore/104928.cc: New test. > --- > > Tested x86_64-linux. > I have spent some time reading the changes, tests, and the bug report and the description of the bug and corresponding changes makes sense to me. > > libstdc++-v3/include/bits/semaphore_base.h | 74 +++++++++---- > .../30_threads/semaphore/104928-2.cc | 102 ++++++++++++++++++ > .../testsuite/30_threads/semaphore/104928.cc | 70 ++++++++++++ > 3 files changed, 223 insertions(+), 23 deletions(-) > create mode 100644 libstdc++-v3/testsuite/30_threads/semaphore/104928-2.cc > create mode 100644 libstdc++-v3/testsuite/30_threads/semaphore/104928.cc > > diff --git a/libstdc++-v3/include/bits/semaphore_base.h > b/libstdc++-v3/include/bits/semaphore_base.h > index 5b5a1c982317..526da4d82b6a 100644 > --- a/libstdc++-v3/include/bits/semaphore_base.h > +++ b/libstdc++-v3/include/bits/semaphore_base.h > @@ -71,7 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > } > > static _GLIBCXX_ALWAYS_INLINE bool > - _S_do_try_acquire(__count_type* __counter, __count_type __old) > noexcept > + _S_do_try_acquire(__count_type* __counter, __count_type& __old) > noexcept > { > if (__old == 0) > return false; > @@ -82,51 +82,79 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > memory_order::relaxed); > } > > - _GLIBCXX_ALWAYS_INLINE void > + void > _M_acquire() noexcept > { > auto const __vfn = [this]{ return > _S_get_current(&this->_M_counter); }; > - auto const __pred = [this](__count_type __cur) { > - return _S_do_try_acquire(&this->_M_counter, __cur); > + auto __val = __vfn(); > + auto const __pred = [&__val](__count_type __cur) { > + if (__cur > 0) > + { > + __val = __cur; > + return true; > + } > + return false; > }; > - std::__atomic_wait_address(&_M_counter, __pred, __vfn, true); > + while (!_S_do_try_acquire(&_M_counter, __val)) > + if (__val == 0) > + std::__atomic_wait_address(&_M_counter, __pred, __vfn, true); > } > > bool > _M_try_acquire() noexcept > { > - auto const __vfn = [this]{ return > _S_get_current(&this->_M_counter); }; > - auto const __pred = [this](__count_type __cur) { > - return _S_do_try_acquire(&this->_M_counter, __cur); > - }; > - using __detail::__wait_clock_t; > - return std::__atomic_wait_address_for(&_M_counter, __pred, __vfn, > - __wait_clock_t::duration(), > - true); > + // The fastest implementation of this function is just > _S_do_try_acquire > + // but that can fail under contention even when _M_count > 0. > + // Using _M_try_acquire_for(0ns) will retry a few times in a loop. > + return _M_try_acquire_for(__detail::__wait_clock_t::duration{}); > } > > template<typename _Clock, typename _Duration> > - _GLIBCXX_ALWAYS_INLINE bool > + bool > _M_try_acquire_until(const chrono::time_point<_Clock, _Duration>& > __atime) noexcept > { > auto const __vfn = [this]{ return > _S_get_current(&this->_M_counter); }; > - auto const __pred = [this](__count_type __cur) { > - return _S_do_try_acquire(&this->_M_counter, __cur); > + auto __val = __vfn(); > + auto const __pred = [&__val](__count_type __cur) { > + if (__cur > 0) > + { > + __val = __cur; > + return true; > + } > + return false; > }; > - return std::__atomic_wait_address_until(&_M_counter, __pred, __vfn, > - __atime, true); > + while (!_S_do_try_acquire(&this->_M_counter, __val)) > + if (__val == 0) > + { > + if (!std::__atomic_wait_address_until(&_M_counter, __pred, > + __vfn, __atime, true)) > + return false; // timed out > + } > + return true; > } > > template<typename _Rep, typename _Period> > - _GLIBCXX_ALWAYS_INLINE bool > + bool > _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime) > noexcept > { > auto const __vfn = [this]{ return > _S_get_current(&this->_M_counter); }; > - auto const __pred = [this](__count_type __cur) { > - return _S_do_try_acquire(&this->_M_counter, __cur); > + auto __val = __vfn(); > + auto const __pred = [&__val](__count_type __cur) { > + if (__cur > 0) > + { > + __val = __cur; > + return true; > + } > + return false; > }; > - return std::__atomic_wait_address_for(&_M_counter, __pred, __vfn, > - __rtime, true); > + while (!_S_do_try_acquire(&this->_M_counter, __val)) > + if (__val == 0) > + { > + if (!std::__atomic_wait_address_for(&_M_counter, __pred, > + __vfn, __rtime, true)) > + return false; // timed out > + } > + return true; > } > > _GLIBCXX_ALWAYS_INLINE ptrdiff_t > diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/104928-2.cc > b/libstdc++-v3/testsuite/30_threads/semaphore/104928-2.cc > new file mode 100644 > index 000000000000..62b42b14627f > --- /dev/null > +++ b/libstdc++-v3/testsuite/30_threads/semaphore/104928-2.cc > @@ -0,0 +1,102 @@ > +// { dg-do run { target c++20 } } > +// { dg-additional-options "-pthread" { target pthread } } > +// { dg-require-gthreads "" } > +// { dg-add-options libatomic } > +// { dg-options "-DSIMULATOR_TEST" { target simulator } } > You are adding this predefine here, but not using it later. > + > +// Bug libstdc++/104928 - std::counting_semaphore on Linux can sleep > forever > + > +#include <semaphore> > +#include <thread> > +#include <chrono> > +#include <atomic> > + > +std::binary_semaphore t1(1); > +std::binary_semaphore sem2(0); > +std::atomic<int> room1 = 0; > +int room2 = 0; > + > +std::atomic<bool> run{true}; > + > +enum class AcquireKind { Acquire, Try, TryFor }; > + > +template<std::ptrdiff_t N, AcquireKind Kind> > +struct Morris > +{ > + using Semaphore = std::counting_semaphore<N>; > + > + Semaphore sem1{1}; > + Semaphore sem2{0}; > + unsigned counter = 0; > + > + void operator()() > + { > + while (run) > + { > + room1 += 1; > + > + acquire(sem1); > + room2 += 1; > + room1 -= 1; > + if (room1 == 0) > + sem2.release(); > + else > + sem1.release(); > + > + acquire(sem2); > + room2 -= 1; > + > + // critical region > + ++counter; > + // end critical region > + > + if (room2 == 0) > + sem1.release(); > + else > + sem2.release(); > + } > + } > + > + void acquire(Semaphore& sem) > + { > + using enum AcquireKind; > + using namespace std::chrono; > + if constexpr (Kind == Acquire) > + sem.acquire(); > + else if constexpr (Kind == Try) > + while (!sem.try_acquire()) { } > + else if constexpr (Kind == TryFor) > + while (!sem.try_acquire_for(1h)) { } > + } > +}; > + > +template<std::ptrdiff_t N, AcquireKind Kind> > +void > +test_morris_kind() > +{ > + Morris<N, Kind> algo; > + std::thread t1(std::ref(algo)); > + std::thread t2(std::ref(algo)); > + std::this_thread::sleep_for(std::chrono::seconds(2)); > + run = false; > + t1.join(); > + t2.join(); > +} > + > +template<std::ptrdiff_t N> > +void > +test_morris() > +{ > + test_morris_kind<N, AcquireKind::Acquire>(); > + test_morris_kind<N, AcquireKind::Try>(); > + test_morris_kind<N, AcquireKind::TryFor>(); > +} > + > +int main() > +{ > + test_morris<1>(); // std::binary_semaphore > + test_morris<1000>(); // std::counting_semaphore that can use futex > +#if PTRDIFF_MAX > INT_MAX > + // test_morris<PTRDIFF_MAX>(); // std::counting_semaphore that cannot > use futex > +#endif > +} > diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/104928.cc > b/libstdc++-v3/testsuite/30_threads/semaphore/104928.cc > new file mode 100644 > index 000000000000..f360da97ff3c > --- /dev/null > +++ b/libstdc++-v3/testsuite/30_threads/semaphore/104928.cc > @@ -0,0 +1,70 @@ > +// { dg-do run { target c++20 } } > +// { dg-additional-options "-pthread" { target pthread } } > +// { dg-require-gthreads "" } > +// { dg-add-options libatomic } > +// { dg-options "-DSIMULATOR_TEST" { target simulator } } > + > +// Bug libstdc++/104928 - std::counting_semaphore on Linux can sleep > forever > + > +#include <semaphore> > +#include <thread> > +#include <chrono> > +#include <climits> > + > +#ifdef SIMULATOR_TEST > +const int loop_count = 100; > +const int thread_count = 6; > +#else > +const int loop_count = 1000000; > +const int thread_count = 20; > +#endif > + > +template<std::ptrdiff_t N, typename Acquire> > +void > +test_acquire(Acquire acq_func) > +{ > + std::counting_semaphore<N * loop_count> s{0}; > + std::thread threads[thread_count]; > + for (int i = 0; i < thread_count; i += 2) { > + threads[i] = std::thread([&s, &acq_func]() { > + for (int i = 0; i < loop_count; ++i) > + acq_func(s); > + }); > + threads[i+1] = std::thread([&s]() { > + for (int i = 0; i < loop_count; ++i) > + s.release(); > + }); > + } > + for (auto& t : threads) > + t.join(); > +} > + > +template<typename Acquire> > +void > +test_all(Acquire f) > +{ > + const int max = INT_MAX / loop_count; > + test_acquire<max>(f); // can use futex > +#if PTRDIFF_MAX > INT_MAX > + test_acquire<max * 10>(f); // cannot use futex > +#endif > +} > + > +int main() > +{ > + test_all([](auto& sem) { sem.acquire(); }); > + > + test_all([](auto& sem) { while (!sem.try_acquire()) { } }); > + > + using namespace std::chrono; > + > + test_all([](auto& sem) { while (!sem.try_acquire_for(1h)) { } }); > + > + auto try_acquire_until = [](auto& sem, auto time) { > + while (!sem.try_acquire_until(time + 1h)) > + { } > + }; > + test_all([&](auto& sem) { try_acquire_until(sem, system_clock::now()); > }); > + test_all([&](auto& sem) { try_acquire_until(sem, steady_clock::now()); > }); > + test_all([&](auto& sem) { try_acquire_until(sem, utc_clock::now()); }); > +} > -- > 2.49.0 > >