On Tue, Jun 3, 2025 at 10:34 AM Jonathan Wakely <jwak...@redhat.com> wrote:

> There's a deadlock in std::counting_semaphore that occurs when the
> semaphore is under contention. The bug happens when one thread tries to
> acquire the mutex, calling __semaphore_base::_S_do_try_acquire to
> atomically decrement the counter using compare_exchange_strong. If the
> counter is non-zero (and so should be possible to decrement) but another
> thread changes it (either incrementing or decrementing it) then the
> compare_exchange fails and _S_do_try_acquire returns false. Because that
> function is used by the predicate passed to __atomic_wait_address, when
> it returns false the thread does a futex wait until the value changes.
> However, when the predicate is false because the compare_exchange failed
> due to not matching the expected value, waiting for the value to change
> is incorrect. The correct behaviour would be to retry the
> compare_exchange using the new value (as long as it's still non-zero).
> Waiting for the value to change again means we can block forever,
> because it might never change again.
>
> The predicate should only test the value, not also attempt to alter it,
> and its return value should mean only one thing, not conflate a busy
> semaphore that cannot be acquired with a contended one that can be
> acquired by retrying.
>
> The correct behaviour of __semaphore_base::_M_acquire is to attempt the
> decrement, but to retry immediately if it failed due to contention on
> the variable (i.e. due to the variable not having the expected value).
> It should only wait for the value to change when the value is zero,
> because that's the only time we can't decrement it.
>
> This commit moves the _S_do_try_acquire call out of the predicate and
> loops while it is false, only doing an atomic wait when the the
> counter's value is zero. The predicate used for the atomic wait now only
> checks whether the value is decrementable (non-zero), without also
> trying to perform that decrement.
>
> In order for the caller to tell whether to retry a failed
> _S_do_try_acquire or wait for the value to be non-zero, the value
> obtained by a failed compare_exchange needs to be passed back to the
> caller. _S_do_try_acquire is changed to take its parameter by reference,
> so that the caller gets the new value and can check whether it's zero.
>
> In order to avoid doing another atomic load after returning from an
> atomic wait, the predicate is also changed to take its parameter by
> reference, so that when a non-zero value is detected that value is
> available to _M_acquire (and can be passed to _S_do_try_acquire as the
> expected value of the compare_exchange). Although this means that the
> predicate is modifying data again, not just checking a value, this
> modification is safe. It's not changing the semaphore's counter, only
> changing a local variable in the caller to avoid a redundant atomic
> load.
>
> Equivalent changes are made to _M_try_acquire_until and
> _M_try_acquire_for. They have the same bug, although they can escape the
> deadlock if the wait is interrupted by timing out. For _M_acquire
> there's no time out so it potentially waits forever.
>
> _M_try_acquire also has the same bug, but can be simplified to just
> calling _M_try_acquire_for with a timeout of zero. That will result in
> calling __wait_impl with the __spin_only flag set, so that the value is
> loaded and checked in a spin loop but there is no sleep. This means that
> _M_try_acquire can still succeed under light contention if the counter
> is being changed concurrently, at the cost of a little extra overhead.
> It would be possible to implement _M_try_acquire as nothing more than an
> atomic load and a compare_exchange, but it would fail when there's is
> any contention.
>
> libstdc++-v3/ChangeLog:
>
>         PR libstdc++/104928
>         * include/bits/semaphore_base.h (_S_do_try_acquire): Take old
>         value by reference.
>         (_M_acquire): Move _S_do_try_acquire call out of the predicate
>         and loop on its result. Make the predicate capture and update
>         the local copy of the value.
>         (_M_try_acquire_until, _M_try_acquire_for): Likewise.
>         (_M_try_acquire): Just call _M_try_acquire_for.
>         * testsuite/30_threads/semaphore/104928-2.cc: New test.
>         * testsuite/30_threads/semaphore/104928.cc: New test.
> ---
>
> Tested x86_64-linux.
>
I have spent some time reading the changes, tests, and the bug report and
 the description of the bug and corresponding changes makes sense to me.


>
>  libstdc++-v3/include/bits/semaphore_base.h    |  74 +++++++++----
>  .../30_threads/semaphore/104928-2.cc          | 102 ++++++++++++++++++
>  .../testsuite/30_threads/semaphore/104928.cc  |  70 ++++++++++++
>  3 files changed, 223 insertions(+), 23 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/30_threads/semaphore/104928-2.cc
>  create mode 100644 libstdc++-v3/testsuite/30_threads/semaphore/104928.cc
>
> diff --git a/libstdc++-v3/include/bits/semaphore_base.h
> b/libstdc++-v3/include/bits/semaphore_base.h
> index 5b5a1c982317..526da4d82b6a 100644
> --- a/libstdc++-v3/include/bits/semaphore_base.h
> +++ b/libstdc++-v3/include/bits/semaphore_base.h
> @@ -71,7 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      }
>
>      static _GLIBCXX_ALWAYS_INLINE bool
> -    _S_do_try_acquire(__count_type* __counter, __count_type __old)
> noexcept
> +    _S_do_try_acquire(__count_type* __counter, __count_type& __old)
> noexcept
>      {
>        if (__old == 0)
>         return false;
> @@ -82,51 +82,79 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>                                                     memory_order::relaxed);
>      }
>
> -    _GLIBCXX_ALWAYS_INLINE void
> +    void
>      _M_acquire() noexcept
>      {
>        auto const __vfn = [this]{ return
> _S_get_current(&this->_M_counter); };
> -      auto const __pred = [this](__count_type __cur) {
> -       return _S_do_try_acquire(&this->_M_counter, __cur);
> +      auto __val = __vfn();
> +      auto const __pred = [&__val](__count_type __cur) {
> +       if (__cur > 0)
> +         {
> +           __val = __cur;
> +           return true;
> +         }
> +       return false;
>        };
> -      std::__atomic_wait_address(&_M_counter, __pred, __vfn, true);
> +      while (!_S_do_try_acquire(&_M_counter, __val))
> +       if (__val == 0)
> +         std::__atomic_wait_address(&_M_counter, __pred, __vfn, true);
>      }
>
>      bool
>      _M_try_acquire() noexcept
>      {
> -      auto const __vfn = [this]{ return
> _S_get_current(&this->_M_counter); };
> -      auto const __pred = [this](__count_type __cur) {
> -       return _S_do_try_acquire(&this->_M_counter, __cur);
> -      };
> -      using __detail::__wait_clock_t;
> -      return std::__atomic_wait_address_for(&_M_counter, __pred, __vfn,
> -                                           __wait_clock_t::duration(),
> -                                           true);
> +      // The fastest implementation of this function is just
> _S_do_try_acquire
> +      // but that can fail under contention even when _M_count > 0.
> +      // Using _M_try_acquire_for(0ns) will retry a few times in a loop.
> +      return _M_try_acquire_for(__detail::__wait_clock_t::duration{});
>      }
>
>      template<typename _Clock, typename _Duration>
> -      _GLIBCXX_ALWAYS_INLINE bool
> +      bool
>        _M_try_acquire_until(const chrono::time_point<_Clock, _Duration>&
> __atime) noexcept
>        {
>         auto const __vfn = [this]{ return
> _S_get_current(&this->_M_counter); };
> -       auto const __pred = [this](__count_type __cur) {
> -         return _S_do_try_acquire(&this->_M_counter, __cur);
> +       auto __val = __vfn();
> +       auto const __pred = [&__val](__count_type __cur) {
> +         if (__cur > 0)
> +           {
> +             __val = __cur;
> +             return true;
> +           }
> +         return false;
>         };
> -       return std::__atomic_wait_address_until(&_M_counter, __pred, __vfn,
> -                                               __atime, true);
> +       while (!_S_do_try_acquire(&this->_M_counter, __val))
> +         if (__val == 0)
> +           {
> +             if (!std::__atomic_wait_address_until(&_M_counter, __pred,
> +                                                   __vfn, __atime, true))
> +               return false; // timed out
> +           }
> +       return true;
>        }
>
>      template<typename _Rep, typename _Period>
> -      _GLIBCXX_ALWAYS_INLINE bool
> +      bool
>        _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime)
> noexcept
>        {
>         auto const __vfn = [this]{ return
> _S_get_current(&this->_M_counter); };
> -       auto const __pred = [this](__count_type __cur) {
> -         return _S_do_try_acquire(&this->_M_counter, __cur);
> +       auto __val = __vfn();
> +       auto const __pred = [&__val](__count_type __cur) {
> +         if (__cur > 0)
> +           {
> +             __val = __cur;
> +             return true;
> +           }
> +         return false;
>         };
> -       return std::__atomic_wait_address_for(&_M_counter, __pred, __vfn,
> -                                             __rtime, true);
> +       while (!_S_do_try_acquire(&this->_M_counter, __val))
> +         if (__val == 0)
> +           {
> +             if (!std::__atomic_wait_address_for(&_M_counter, __pred,
> +                                                 __vfn, __rtime, true))
> +               return false; // timed out
> +           }
> +       return true;
>        }
>
>      _GLIBCXX_ALWAYS_INLINE ptrdiff_t
> diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/104928-2.cc
> b/libstdc++-v3/testsuite/30_threads/semaphore/104928-2.cc
> new file mode 100644
> index 000000000000..62b42b14627f
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/30_threads/semaphore/104928-2.cc
> @@ -0,0 +1,102 @@
> +// { dg-do run { target c++20 } }
> +// { dg-additional-options "-pthread" { target pthread } }
> +// { dg-require-gthreads "" }
> +// { dg-add-options libatomic }
> +// { dg-options "-DSIMULATOR_TEST" { target simulator } }
>
You are adding this predefine here, but not using it later.

> +
> +// Bug libstdc++/104928 - std::counting_semaphore on Linux can sleep
> forever
> +
> +#include <semaphore>
> +#include <thread>
> +#include <chrono>
> +#include <atomic>
> +
> +std::binary_semaphore t1(1);
> +std::binary_semaphore sem2(0);
> +std::atomic<int> room1 = 0;
> +int room2 = 0;
> +
> +std::atomic<bool> run{true};
> +
> +enum class AcquireKind { Acquire, Try, TryFor };
> +
> +template<std::ptrdiff_t N, AcquireKind Kind>
> +struct Morris
> +{
> +  using Semaphore = std::counting_semaphore<N>;
> +
> +  Semaphore sem1{1};
> +  Semaphore sem2{0};
> +  unsigned counter = 0;
> +
> +  void operator()()
> +  {
> +    while (run)
> +    {
> +      room1 += 1;
> +
> +      acquire(sem1);
> +      room2 += 1;
> +      room1 -= 1;
> +      if (room1 == 0)
> +       sem2.release();
> +      else
> +       sem1.release();
> +
> +      acquire(sem2);
> +      room2 -= 1;
> +
> +      // critical region
> +      ++counter;
> +      // end critical region
> +
> +      if (room2 == 0)
> +       sem1.release();
> +      else
> +       sem2.release();
> +    }
> +  }
> +
> +  void acquire(Semaphore& sem)
> +  {
> +    using enum AcquireKind;
> +    using namespace std::chrono;
> +    if constexpr (Kind == Acquire)
> +      sem.acquire();
> +    else if constexpr (Kind == Try)
> +      while (!sem.try_acquire()) { }
> +    else if constexpr (Kind == TryFor)
> +      while (!sem.try_acquire_for(1h)) { }
> +  }
> +};
> +
> +template<std::ptrdiff_t N, AcquireKind Kind>
> +void
> +test_morris_kind()
> +{
> +  Morris<N, Kind> algo;
> +  std::thread t1(std::ref(algo));
> +  std::thread t2(std::ref(algo));
> +  std::this_thread::sleep_for(std::chrono::seconds(2));
> +  run = false;
> +  t1.join();
> +  t2.join();
> +}
> +
> +template<std::ptrdiff_t N>
> +void
> +test_morris()
> +{
> +  test_morris_kind<N, AcquireKind::Acquire>();
> +  test_morris_kind<N, AcquireKind::Try>();
> +  test_morris_kind<N, AcquireKind::TryFor>();
> +}
> +
> +int main()
> +{
> +  test_morris<1>();    // std::binary_semaphore
> +  test_morris<1000>(); // std::counting_semaphore that can use futex
> +#if PTRDIFF_MAX > INT_MAX
> +  // test_morris<PTRDIFF_MAX>(); // std::counting_semaphore that cannot
> use futex
> +#endif
> +}
> diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/104928.cc
> b/libstdc++-v3/testsuite/30_threads/semaphore/104928.cc
> new file mode 100644
> index 000000000000..f360da97ff3c
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/30_threads/semaphore/104928.cc
> @@ -0,0 +1,70 @@
> +// { dg-do run { target c++20 } }
> +// { dg-additional-options "-pthread" { target pthread } }
> +// { dg-require-gthreads "" }
> +// { dg-add-options libatomic }
> +// { dg-options "-DSIMULATOR_TEST" { target simulator } }
> +
> +// Bug libstdc++/104928 - std::counting_semaphore on Linux can sleep
> forever
> +
> +#include <semaphore>
> +#include <thread>
> +#include <chrono>
> +#include <climits>
> +
> +#ifdef SIMULATOR_TEST
> +const int loop_count = 100;
> +const int thread_count = 6;
> +#else
> +const int loop_count = 1000000;
> +const int thread_count = 20;
> +#endif
> +
> +template<std::ptrdiff_t N, typename Acquire>
> +void
> +test_acquire(Acquire acq_func)
> +{
> +  std::counting_semaphore<N * loop_count> s{0};
> +  std::thread threads[thread_count];
> +  for (int i = 0; i < thread_count; i += 2) {
> +    threads[i] = std::thread([&s, &acq_func]() {
> +      for (int i = 0; i < loop_count; ++i)
> +       acq_func(s);
> +    });
> +    threads[i+1] = std::thread([&s]() {
> +      for (int i = 0; i < loop_count; ++i)
> +       s.release();
> +    });
> +  }
> +  for (auto& t : threads)
> +    t.join();
> +}
> +
> +template<typename Acquire>
> +void
> +test_all(Acquire f)
> +{
> +  const int max = INT_MAX / loop_count;
> +  test_acquire<max>(f);  // can use futex
> +#if PTRDIFF_MAX > INT_MAX
> +  test_acquire<max * 10>(f); // cannot use futex
> +#endif
> +}
> +
> +int main()
> +{
> +  test_all([](auto& sem) { sem.acquire(); });
> +
> +  test_all([](auto& sem) { while (!sem.try_acquire()) { } });
> +
> +  using namespace std::chrono;
> +
> +  test_all([](auto& sem) { while (!sem.try_acquire_for(1h)) { } });
> +
> +  auto try_acquire_until = [](auto& sem, auto time) {
> +    while (!sem.try_acquire_until(time + 1h))
> +    { }
> +  };
> +  test_all([&](auto& sem) { try_acquire_until(sem, system_clock::now());
> });
> +  test_all([&](auto& sem) { try_acquire_until(sem, steady_clock::now());
> });
> +  test_all([&](auto& sem) { try_acquire_until(sem, utc_clock::now()); });
> +}
> --
> 2.49.0
>
>

Reply via email to