https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928
Bug ID: 104928 Summary: std::counting_semaphore on Linux can sleep forever Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: angelsl at in04 dot sg Target Milestone: --- The __atomic_semaphore implementation of std::counting_semaphore can sometimes sleep forever, especially when there is high contention. Here is a possible sequence of events that can lead to the issue: 1. _M_counter is 1. 2. thread A enters (semaphore_base.h) _M_acquire -> (atomic_wait.h) __atomic_wait_address_bare -> _S_do_spin. 3. In _S_do_spin, thread A does __atomic_load on _M_counter and reads 1. 4. A different thread manages to decrement the semaphore. _M_counter is now 0. 5. Thread A enters __atomic_spin, and all calls to __pred (which just does _S_do_try_acquire) fail, since _M_counter is now 0. Thread A returns from __atomic_spin and _S_do_spin with false. 6. Another thread signals the semaphore. _M_counter is back to 1. 7. Thread A returns to __atomic_wait_address_bare and proceeds to call __platform_wait using the value of _M_counter read in step 3 (= 1). 8. In __platform_wait, thread A calls futex on _M_counter with WAIT_PRIVATE and value 1. Since _M_counter is still 1, the kernel puts thread A to sleep. 9. Unless the semaphore later reaches 0 again, thread A will never be awoken since _M_release does not call futex WAKE unless the counter is 0. This is sort of an ABA problem. I think the most proper solution would be to make it so that futex is called with value 0 (since that is what we are waiting for _M_counter to change from). But it will need some restructuring to the current abstractions over futex that are being used here. Here is an almost 100%-reliable minimal reproducible example for the issue: --- // g++ -O2 -std=c++20 -o hang hang.cpp #include <iostream> #include <semaphore> #include <thread> #include <vector> int main(int argc, char *argv[]) { std::counting_semaphore s{0}; std::vector<std::thread> threads; for (size_t i = 0; i < 10; ++i) { threads.emplace_back([&s]() { for (size_t i = 0; i < 1000000; ++i) { s.acquire(); } }); threads.emplace_back([&s]() { for (size_t i = 0; i < 1000000; ++i) { s.release(); } }); } for (auto &t : threads) t.join(); return 0; } --- If you run it, you will observe one or two acquirer threads sleeping indefinitely at the end. If you observe them in gdb, you will see that they are waiting on futex, and also that the semaphore counter is actually positive. Other notes: - This probably affects the implementation of atomic::wait as well, but I have not tried that. - FWIW, libc++ also has this exact issue, but they sidestep it by passing a timeout to futex(). I don't know whether specifying a timeout was done intentionally to avoid this issue. I also don't personally think that that is a great solution to this. (If you run the MRE with clang++ -stdlib=libc++, you will observe the issue manifesting as some acquirer threads sleeping for a few seconds until the futex timeout expires.) System information: Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror --with-build-config=bootstrap-lto --enable-link-serialization=1 gdc_include_dir=/usr/include/dlang/gdc Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.2.0 (GCC) I have attached the preprocessed version of the above MRE.