On 21/04/21 10:10 -0700, Thomas Rodgers wrote:
From: Thomas Rodgers <rodg...@twrodgers.com>

A change was made to __atomic_semaphore::_S_do_try_acquire() to
(ideally) let the compare_exchange reload the value of __old rather than
always reloading it twice. This causes _M_acquire to spin indefinitely
if the value of __old is already 0.

libstdc++/ChangeLog:
        * include/bits/semaphore_base.h: Always reload __old in
        __atomic_semaphore::_S_do_try_acquire().
        * testsuite/30_threads/stop_token/stop_callback/destroy.cc
        re-enable testcase.
---
libstdc++-v3/include/bits/semaphore_base.h       | 16 ++++++----------
.../stop_token/stop_callback/destroy.cc          |  2 --
2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/libstdc++-v3/include/bits/semaphore_base.h 
b/libstdc++-v3/include/bits/semaphore_base.h
index 35469e443b0..84b33423fff 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -196,9 +196,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    __atomic_semaphore& operator=(const __atomic_semaphore&) = delete;

    static _GLIBCXX_ALWAYS_INLINE bool
-    _S_do_try_acquire(__detail::__platform_wait_t* __counter,
-                     __detail::__platform_wait_t& __old) noexcept
+    _S_do_try_acquire(__detail::__platform_wait_t* __counter) noexcept
    {
+      auto __old = __atomic_impl::load(__counter, memory_order::acquire);
      if (__old == 0)
        return false;

@@ -211,18 +211,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    _GLIBCXX_ALWAYS_INLINE void
    _M_acquire() noexcept
    {
-      auto __old = __atomic_impl::load(&_M_counter, memory_order::acquire);
      auto const __pred =
-       [this, &__old] { return _S_do_try_acquire(&this->_M_counter, __old); };
+       [this] { return _S_do_try_acquire(&this->_M_counter); };
      std::__atomic_wait_address_bare(&_M_counter, __pred);
    }

    bool
    _M_try_acquire() noexcept
    {
-      auto __old = __atomic_impl::load(&_M_counter, memory_order::acquire);
      auto const __pred =
-       [this, &__old] { return _S_do_try_acquire(&this->_M_counter, __old); };
+       [this] { return _S_do_try_acquire(&this->_M_counter); };
      return std::__detail::__atomic_spin(__pred);
    }

@@ -231,9 +229,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
      _M_try_acquire_until(const chrono::time_point<_Clock,
                           _Duration>& __atime) noexcept
      {
-       auto __old = __atomic_impl::load(&_M_counter, memory_order_relaxed);
        auto const __pred =
-         [this, &__old] { return _S_do_try_acquire(&this->_M_counter, __old); 
};
+         [this] { return _S_do_try_acquire(&this->_M_counter); };

        return __atomic_wait_address_until_bare(&_M_counter, __pred, __atime);
      }
@@ -243,9 +240,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
      _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime)
        noexcept
      {
-       auto __old = __atomic_impl::load(&_M_counter, memory_order_relaxed);
        auto const __pred =
-         [this, &__old] { return _S_do_try_acquire(&this->_M_counter, __old); 
};
+         [this] { return _S_do_try_acquire(&this->_M_counter); };

        return __atomic_wait_address_for_bare(&_M_counter, __pred, __rtime);
      }

As discussed on IRC (and repeating here for my benefit when I forget
how this works again) the bug is in _M_acquire() because my suggestion
to not refresh the __old value with an atomic_load on every loop means
we fail to meet the requirement in the second bullet here:

  -15- Effects: Repeatedly performs the following steps, in order:
  (15.1) — Evaluates try_acquire. If the result is true, returns.
  (15.2) — Blocks on *this until counter is greater than zero.

If we don't re-load the value, then we always see the initial zero,
and every call to _S_do_try_acquire() returns false immediately.

An alternative fix would be to keep the caching of the value in the
predicate for the try_acquire{,_for_until} functions, but refresh it
for acquire, like so:

--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -223,9 +223,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     _GLIBCXX_ALWAYS_INLINE void
     _M_acquire() noexcept
     {
-      auto __old = __atomic_impl::load(&_M_counter, memory_order::acquire);
-      auto const __pred =
-       [this, &__old] { return _S_do_try_acquire(&this->_M_counter, __old); };
+      auto const __pred = [this] {
+           auto __old = __atomic_impl::load(&_M_counter, 
memory_order::acquire);
+           return _S_do_try_acquire(&this->_M_counter, __old);
+      };
       std::__atomic_wait_address_bare(&_M_counter, __pred);
     }
This fixes the stop_callback/destroy.cc hang for me on x86-32.

Refreshing the value on every loop for try_acquire{,_for,_until} is
not necessary for correctness, because the standard says:

  An implementation should ensure that try_acquire does not consistently
  return false in the absence of contending semaphore operations.

But refreshing it does reduce the likelihood of failing to acquire the
semaphore if the counter is repeatedly being updated by other threads
(e.g. another thread doing acquire/release in a loop). It's unclear
whether the cost of the extra loads is worth it to reduce the chance
of starvation in this case.

Please commit your patch to trunk, since that's what you had in your
original patch before I asked you to change it (causing the bug).

We should do this for gcc-11 too if an RM approves it, since acquire()
is currently broken.

Reply via email to