https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928

            Bug ID: 104928
           Summary: std::counting_semaphore on Linux can sleep forever
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: angelsl at in04 dot sg
  Target Milestone: ---

The __atomic_semaphore implementation of std::counting_semaphore can sometimes
sleep forever, especially when there is high contention.

Here is a possible sequence of events that can lead to the issue:

1. _M_counter is 1.
2. thread A enters (semaphore_base.h) _M_acquire -> (atomic_wait.h)
__atomic_wait_address_bare -> _S_do_spin.
3. In _S_do_spin, thread A does __atomic_load on _M_counter and reads 1.
4. A different thread manages to decrement the semaphore. _M_counter is now 0.
5. Thread A enters __atomic_spin, and all calls to __pred (which just does
_S_do_try_acquire) fail, since _M_counter is now 0. Thread A returns from
__atomic_spin and _S_do_spin with false.
6. Another thread signals the semaphore. _M_counter is back to 1.
7. Thread A returns to __atomic_wait_address_bare and proceeds to call
__platform_wait using the value of _M_counter read in step 3 (= 1).
8. In __platform_wait, thread A calls futex on _M_counter with WAIT_PRIVATE and
value 1. Since _M_counter is still 1, the kernel puts thread A to sleep.
9. Unless the semaphore later reaches 0 again, thread A will never be awoken
since _M_release does not call futex WAKE unless the counter is 0.

This is sort of an ABA problem. I think the most proper solution would be to
make it so that futex is called with value 0 (since that is what we are waiting
for _M_counter to change from). But it will need some restructuring to the
current abstractions over futex that are being used here.

Here is an almost 100%-reliable minimal reproducible example for the issue:

---
// g++ -O2 -std=c++20 -o hang hang.cpp
#include <iostream>
#include <semaphore>
#include <thread>
#include <vector>

int main(int argc, char *argv[]) {
  std::counting_semaphore s{0};
  std::vector<std::thread> threads;
  for (size_t i = 0; i < 10; ++i) {
    threads.emplace_back([&s]() {
      for (size_t i = 0; i < 1000000; ++i) {
        s.acquire();
      }
    });
    threads.emplace_back([&s]() {
      for (size_t i = 0; i < 1000000; ++i) {
        s.release();
      }
    });
  }
  for (auto &t : threads) t.join();
  return 0;
}
---

If you run it, you will observe one or two acquirer threads sleeping
indefinitely at the end. If you observe them in gdb, you will see that they are
waiting on futex, and also that the semaphore counter is actually positive.

Other notes:
- This probably affects the implementation of atomic::wait as well, but I have
not tried that.
- FWIW, libc++ also has this exact issue, but they sidestep it by passing a
timeout to futex(). I don't know whether specifying a timeout was done
intentionally to avoid this issue. I also don't personally think that that is a
great solution to this. (If you run the MRE with clang++ -stdlib=libc++, you
will observe the issue manifesting as some acquirer threads sleeping for a few
seconds until the futex timeout expires.)

System information:

Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-bootstrap
--prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit
--enable-cet=auto --enable-checking=release --enable-clocale=gnu
--enable-default-pie --enable-default-ssp --enable-gnu-indirect-function
--enable-gnu-unique-object --enable-linker-build-id --enable-lto
--enable-multilib --enable-plugin --enable-shared --enable-threads=posix
--disable-libssp --disable-libstdcxx-pch --disable-werror
--with-build-config=bootstrap-lto --enable-link-serialization=1
gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.0 (GCC)

I have attached the preprocessed version of the above MRE.

Reply via email to