On Sun, Nov 16, 2025 at 1:56 AM Jonathan Wakely <[email protected]> wrote:
> This will allow us to extend atomic waiting functions to support a
> possible future 64-bit version of futex, as well as supporting
> futex-like wait/wake primitives on other targets (e.g. macOS has
> os_sync_wait_on_address and FreeBSD has _umtx_op).
>
> Before this change, the decision of whether to do a proxy wait or to
> wait on the atomic variable itself was made in the header at
> compile-time, which makes it an ABI property that would not have been
> possible to change later. That would have meant that
> std::atomic<uint64_t> would always have to do a proxy wait even if Linux
> gains support for 64-bit futex2(2) calls at some point in the future.
> The disadvantage of proxy waits is that several distinct atomic objects
> can share the same proxy state, leading to contention between threads
> even when they are not waiting on the same atomic object, similar to
> false sharing. It also result in spurious wake-ups because doing a
> notify on an atomic object that uses a proxy wait will wake all waiters
> sharing the proxy.
>
> For types that are known to definitely not need a proxy wait (e.g. int
> on Linux) the header can still choose a more efficient path at
> compile-time. But for other types, the decision of whether to do a proxy
> wait is deferred to runtime, inside the library internals. This will
> make it possible for future versions of libstdc++.so to extend the set
> of types which don't need to use proxy waits, without ABI changes.
>
> The way the change works is to stop using the __proxy_wait flag that was
> set by the inline code in the headers. Instead the __wait_args struct
> has an extra pointer member which the library internals populate with
> either the address of the atomic object or the _M_ver counter in the
> proxy state. There is also a new _M_obj_size member which stores the
> size of the atomic object, so that the library can decide whether a
> proxy is needed. So for example if linux gains 64-bit futex support then
> the library can decide not to use a proxy when _M_obj_size == 8.
> Finally, the _M_old member of the __wait_args struct is changed to
> uint64_t so that it has room to store 64-bit values, not just whatever
> size the __platform_wait_t type is (which is a 32-bit int on Linux).
> Similarly, the _M_val member of __wait_result_type changes to uint64_t
> too.
>
> libstdc++-v3/ChangeLog:
>
> * config/abi/pre/gnu.ver: Adjust exports.
> * include/bits/atomic_timed_wait.h
> (_GLIBCXX_HAVE_PLATFORM_TIMED_WAIT):
> Do not define this macro.
> (__atomic_wait_address_until_v, __atomic_wait_address_for_v):
> Guard assertions with #ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT.
> * include/bits/atomic_wait.h (__platform_wait_uses_type):
> Different separately for platforms with and without platform
> wait.
> (_GLIBCXX_HAVE_PLATFORM_WAIT): Do not define this macro.
> (_GLIBCXX_UNKNOWN_PLATFORM_WAIT): Define new macro.
> (__wait_value_type): New typedef.
> (__wait_result_type): Change _M_val to __wait_value_type.
> (__wait_args_base::_M_old): Change to __wait_args_base.
> (__wait_args_base::_M_obg, __wait_args_base::_M_obj_size): New
> data members.
> (__wait_args::__wait_args): Set _M_obj and _M_obj_size on
> construction.
> (__wait_args::_M_setup_wait): Change void* parameter to deduced
> type. Use _S_bit_cast instead of __builtin_bit_cast.
> (__wait_args::_M_load_proxy_wait_val): Remove function, replace
> with ...
> (__wait_args::_M_setup_wait_impl): New function.
> (__wait_args::_S_bit_cast): Wrapper for __builtin_bit_cast which
> also supports conversion from 32-bit values.
> (__wait_args::_S_flags_for): Do not set __proxy_wait flag.
> (__atomic_wait_address_v): Guard assertions with #ifdef
> _GLIBCXX_UNKNOWN_PLATFORM_WAIT.
> * src/c++20/atomic.cc (_GLIBCXX_HAVE_PLATFORM_WAIT): Define here
> instead of in header. Check _GLIBCXX_HAVE_PLATFORM_WAIT instead
> of _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT.
> (__spin_impl): Adjust for 64-bit __wait_args_base::_M_old.
> (use_proxy_wait): New function.
> (__wait_args::_M_load_proxy_wait_val): Replace with ...
> (__wait_args::_M_setup_wait_impl): New function. Call
> use_proxy_wait to decide at runtime whether to wait on the
> pointer directly instead of using a proxy. If a proxy is needed,
> set _M_obj to point to its _M_ver member. Adjust for change to
> type of _M_old.
> (__wait_impl): Wait on _M_obj unconditionally.
> (__notify_impl): Call use_proxy_wait to decide whether to notify
> on the address parameter or a proxy
> (__spin_until_impl): Adjust for change to type of _M_val.
> (__wait_until_impl): Wait on _M_obj unconditionally.
> ---
>
> Tested x86_64-linux, powerpc64le-linux, sparc-solaris.
>
A lot of comments below.
>
> I think this is an imporant change which I unfortunately didn't think of
> until recently.
>
> This changes the exports from the shared library, but we're still in
> stage 1 so I think that should be allowed (albeit unfortunate). Nobody
> should be expecting GCC 16 to be stable yet.
>
> The __proxy_wait enumerator is now unused and could be removed. The
> __abi_version enumerator could also be bumped to indicate the
> incompatibility with earlier snapshots of GCC 16, but I don't think that
> is needed. We could in theory keep the old symbol export
> (__wait_args::_M_load_proxy_wait) and make it trap/abort if called, but
> I'd prefer to just remove it and cause dynamic linker errors instead.
>
> There's a TODO in the header about which types should be allowed to take
> the optimized paths (see the __waitable concept). For types where that's
> true, if the size matches a futex then we'll use a futex, even if it's
> actually an enum or floating-point type (or pointer on 32-bit targets).
> I'm not sure if that's safe.
>
>
> libstdc++-v3/config/abi/pre/gnu.ver | 3 +-
> libstdc++-v3/include/bits/atomic_timed_wait.h | 12 +-
> libstdc++-v3/include/bits/atomic_wait.h | 109 +++++++++-----
> libstdc++-v3/src/c++20/atomic.cc | 140 +++++++++++-------
> 4 files changed, 166 insertions(+), 98 deletions(-)
>
> diff --git a/libstdc++-v3/config/abi/pre/gnu.ver
> b/libstdc++-v3/config/abi/pre/gnu.ver
> index 2e48241d51f9..3c2bd4921730 100644
> --- a/libstdc++-v3/config/abi/pre/gnu.ver
> +++ b/libstdc++-v3/config/abi/pre/gnu.ver
> @@ -2553,7 +2553,8 @@ GLIBCXX_3.4.35 {
> _ZNSt8__detail11__wait_implEPKvRNS_16__wait_args_baseE;
> _ZNSt8__detail13__notify_implEPKvbRKNS_16__wait_args_baseE;
>
>
> _ZNSt8__detail17__wait_until_implEPKvRNS_16__wait_args_baseERKNSt6chrono8durationI[lx]St5ratioIL[lx]1EL[lx]1000000000EEEE;
> - _ZNSt8__detail11__wait_args22_M_load_proxy_wait_valEPKv;
> + _ZNSt8__detail11__wait_args18_M_setup_wait_implEPKv;
> + _ZNSt8__detail11__wait_args20_M_setup_notify_implEPKv;
>
> # std::chrono::gps_clock::now, tai_clock::now
> _ZNSt6chrono9gps_clock3nowEv;
> diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h
> b/libstdc++-v3/include/bits/atomic_timed_wait.h
> index 30f7ff616840..918a267d10eb 100644
> --- a/libstdc++-v3/include/bits/atomic_timed_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
> @@ -75,14 +75,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> return chrono::ceil<__w_dur>(__atime);
> }
>
> -#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
> -#define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> -#else
> -// define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement
> __platform_wait_until
> -// if there is a more efficient primitive supported by the platform
> -// (e.g. __ulock_wait) which is better than pthread_cond_clockwait.
> -#endif // ! HAVE_LINUX_FUTEX
> -
> __wait_result_type
> __wait_until_impl(const void* __addr, __wait_args_base& __args,
> const __wait_clock_t::duration& __atime);
> @@ -156,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> const chrono::time_point<_Clock, _Dur>&
> __atime,
> bool __bare_wait = false) noexcept
> {
> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
> __glibcxx_assert(false); // This function can't be used for proxy
> wait.
> #endif
> __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
> @@ -208,7 +200,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> const chrono::duration<_Rep, _Period>&
> __rtime,
> bool __bare_wait = false) noexcept
> {
> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
>
This name really reads strange, and sounds like something with "TODO".
I think _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT was just OK name, even
if it was not used directly.
> __glibcxx_assert(false); // This function can't be used for proxy
> wait.
> #endif
> __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
> diff --git a/libstdc++-v3/include/bits/atomic_wait.h
> b/libstdc++-v3/include/bits/atomic_wait.h
> index 95151479c120..49369419d6a6 100644
> --- a/libstdc++-v3/include/bits/atomic_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_wait.h
> @@ -45,35 +45,34 @@
> namespace std _GLIBCXX_VISIBILITY(default)
> {
> _GLIBCXX_BEGIN_NAMESPACE_VERSION
> +#if defined _GLIBCXX_HAVE_LINUX_FUTEX
> namespace __detail
> {
> -#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
> -#define _GLIBCXX_HAVE_PLATFORM_WAIT 1
> using __platform_wait_t = int;
> inline constexpr size_t __platform_wait_alignment = 4;
> + }
> + template<typename _Tp>
> + inline constexpr bool __platform_wait_uses_type
> + = is_scalar_v<_Tp> && sizeof(_Tp) == sizeof(int) && alignof(_Tp) >=
> 4;
> #else
> +# define _GLIBCXX_UNKNOWN_PLATFORM_WAIT 1
> // define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
> // and __platform_notify() if there is a more efficient primitive
> supported
> // by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better
> than
> // a mutex/condvar based wait.
> + namespace __detail
> + {
> # if ATOMIC_LONG_LOCK_FREE == 2
> using __platform_wait_t = unsigned long;
> # else
> using __platform_wait_t = unsigned int;
> # endif
> inline constexpr size_t __platform_wait_alignment
> - = __alignof__(__platform_wait_t);
> -#endif
> + = sizeof(__platform_wait_t) < __alignof__(__platform_wait_t)
> + ? __alignof__(__platform_wait_t) : sizeof(__platform_wait_t);
> } // namespace __detail
> -
> - template<typename _Tp>
> - inline constexpr bool __platform_wait_uses_type
> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> - = is_scalar_v<_Tp>
> - && ((sizeof(_Tp) == sizeof(__detail::__platform_wait_t))
> - && (alignof(_Tp) >= __detail::__platform_wait_alignment));
> -#else
> - = false;
> + template<typename>
> + inline constexpr bool __platform_wait_uses_type = false;
> #endif
>
> namespace __detail
> @@ -105,10 +104,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
> }
>
> - // lightweight std::optional<__platform_wait_t>
> + // TODO: this needs to be false for types with padding, e.g. __int20.
>
I do not understand why this needs to be required. This funciton is used
only via atomic
or atomic_ref. For atomic, we can guarantee that the type has padding bytes
cleared..
> + // TODO: should this be true only for integral, enum, and pointer
> types?
>
What I think is missing here is alignment. I assume that any platform wait
may use
bits that are clear due the alignment of platform wait type for some
internal state.
Or we are going to check is_sufficiently_aligment in cc file, and use
different kind of
wait depending on the object?
But I think, we can later safely extend or change what is waitable (except
extending it past 8 bytes),
as if we start putting _M_obj_size to non zero, impl may use platform wait.
So, I will go with safe option is integral, pointer or enum, This would
also give
us no padding guarantee, I assume?
> + template<typename _Tp>
> + concept __waitable
> + = is_scalar_v<_Tp> && (sizeof(_Tp) <= sizeof(__UINT64_TYPE__));
> +
> + // Storage for up to 64 bits of value, should be considered opaque
> bits.
> + using __wait_value_type = __UINT64_TYPE__;
> +
> + // lightweight std::optional<__wait_value_type>
> struct __wait_result_type
> {
> - __platform_wait_t _M_val;
> + __wait_value_type _M_val;
> unsigned char _M_has_val : 1; // _M_val value was loaded before
> return.
> unsigned char _M_timeout : 1; // Waiting function ended with
> timeout.
> unsigned char _M_unused : 6; // padding
> @@ -143,8 +151,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> {
> __wait_flags _M_flags;
> int _M_order = __ATOMIC_ACQUIRE;
> - __platform_wait_t _M_old = 0;
> + __wait_value_type _M_old{};
> void* _M_wait_state = nullptr;
> + const void* _M_obj = nullptr; // The address of the object to wait
> on.
> + unsigned char _M_obj_size = 0; // The size of that object.
>
> // Test whether _M_flags & __flags is non-zero.
> bool
> @@ -162,36 +172,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> explicit
> __wait_args(const _Tp* __addr, bool __bare_wait = false) noexcept
> : __wait_args_base{ _S_flags_for(__addr, __bare_wait) }
> - { }
> + {
> + _M_obj = __addr; // Might be replaced by _M_setup_wait
> + if constexpr (__waitable<_Tp>)
> + // __wait_impl might be able to wait directly on __addr
> + // instead of using a proxy, depending on its size.
> + _M_obj_size = sizeof(_Tp);
> + }
>
> __wait_args(const __platform_wait_t* __addr, __platform_wait_t
> __old,
> int __order, bool __bare_wait = false) noexcept
> - : __wait_args_base{ _S_flags_for(__addr, __bare_wait), __order,
> __old }
> - { }
> + : __wait_args(__addr, __bare_wait)
> + {
> + _M_order = __order;
> + _M_old = __old;
> + }
>
> __wait_args(const __wait_args&) noexcept = default;
> __wait_args& operator=(const __wait_args&) noexcept = default;
>
> - template<typename _ValFn,
> - typename _Tp = decay_t<decltype(std::declval<_ValFn&>()())>>
> + template<typename _Tp, typename _ValFn>
> _Tp
> - _M_setup_wait(const void* __addr, _ValFn __vfn,
> + _M_setup_wait(const _Tp* __addr, _ValFn __vfn,
> __wait_result_type __res = {})
> {
> + static_assert(is_same_v<_Tp, decay_t<decltype(__vfn())>>);
> +
> if constexpr (__platform_wait_uses_type<_Tp>)
> {
> - // If the wait is not proxied, the value we check when
> waiting
> - // is the value of the atomic variable itself.
> + // If we know for certain that this type can be waited on
> + // efficiently using something like a futex syscall,
> + // then we can avoid the overhead of _M_setup_wait_impl
> + // and just load the value and store it into _M_old.
>
> - if (__res._M_has_val) // The previous wait loaded a recent
> value.
> + if (__res._M_has_val) // A previous wait loaded a recent
> value.
> {
> _M_old = __res._M_val;
> - return __builtin_bit_cast(_Tp, __res._M_val);
> + return _S_bit_cast<_Tp>(_M_old);
>
I am not sure if I understand which branch of _S_bit_cast this would use,
we have neither (sizeof(To) == sizeof(From) i.e. 4 vs 8) and
neither sizeof(_From) == sizeof(__UINT32_TYPE__).
I would much more prefer if this would be dome as:
return __builtin_bit_cast(_Tp, static_cast<__UINT32_TYPE__>(_M_old));
> }
> else // Load the value from __vfn
> {
> - _Tp __val = __vfn();
> - _M_old = __builtin_bit_cast(__platform_wait_t, __val);
> + auto __val = __vfn();
> + _M_old = _S_bit_cast<__wait_value_type>(__val);
>
And here:
_M_ old = __builtin_bit_cast(__UINT32_TYPE__, __val);
However, instead of __UINT32_TYPE__, we should use
make_unsinged<__platform_wait_t>.
> return __val;
> }
> }
> @@ -199,16 +221,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> {
> if (__res._M_has_val) // The previous wait loaded a recent
> value.
> _M_old = __res._M_val;
> - else // Load _M_ver from the proxy (must happen before
> __vfn()).
> - _M_load_proxy_wait_val(__addr);
> + else // Let the library decide how to setup the wait.
> + {
> + // Set _M_obj to the address to be waited on (either
> __addr
> + // or a proxy) and load its current value into _M_old.
> + _M_setup_wait_impl(__addr);
> + }
> return __vfn();
> }
> }
>
> private:
> - // Populates _M_wait_state and _M_old from the proxy for __addr.
> + template<typename _To, typename _From>
> + [[__gnu__::__always_inline__]]
> + static _To
> + _S_bit_cast(_From __from)
> + {
> + if constexpr (sizeof(_To) == sizeof(_From))
> + return __builtin_bit_cast(_To, __from);
> + else if constexpr (sizeof(_From) == sizeof(__UINT32_TYPE__))
> + return __builtin_bit_cast(__UINT32_TYPE__, __from);
> + else
> + __builtin_unreachable();
> + }
> +
> + // Populates _M_wait_state and _M_old appropriately for __addr.
> void
> - _M_load_proxy_wait_val(const void* __addr);
> + _M_setup_wait_impl(const void* __addr);
>
> template<typename _Tp>
> static constexpr __wait_flags
> @@ -218,8 +257,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> __wait_flags __res = __abi_version | __do_spin;
> if (!__bare_wait)
> __res |= __track_contention;
> - if constexpr (!__platform_wait_uses_type<_Tp>)
> - __res |= __proxy_wait;
> + // if constexpr (!__platform_wait_uses_type<_Tp>)
> + // __res |= __proxy_wait;
>
Remove this simply.
> return __res;
> }
> };
> @@ -255,7 +294,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> __detail::__platform_wait_t __old,
> int __order, bool __bare_wait = false)
> {
> -#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
> __glibcxx_assert(false); // This function can't be used for proxy
> wait.
> #endif
> __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
> diff --git a/libstdc++-v3/src/c++20/atomic.cc
> b/libstdc++-v3/src/c++20/atomic.cc
> index e280045b619d..d5fbd10ed11f 100644
> --- a/libstdc++-v3/src/c++20/atomic.cc
> +++ b/libstdc++-v3/src/c++20/atomic.cc
> @@ -38,14 +38,7 @@
> # include <syscall.h>
> # include <bits/functexcept.h>
> # include <sys/time.h>
> -#endif
> -
> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> -# ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> -// __waitable_state assumes that we consistently use the same
> implementation
> -// (i.e. futex vs mutex+condvar) for timed and untimed waiting.
> -# error "This configuration is not currently supported"
> -# endif
> +# define _GLIBCXX_HAVE_PLATFORM_WAIT 1
> #endif
>
> #pragma GCC diagnostic ignored "-Wmissing-field-initializers"
> @@ -107,7 +100,7 @@ namespace
> // Count of threads blocked waiting on this state.
> alignas(_S_align) __platform_wait_t _M_waiters = 0;
>
> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
> mutex _M_mtx;
>
> // This type meets the Cpp17BasicLockable requirements.
> @@ -123,7 +116,7 @@ namespace
> // use this for waiting and notifying functions instead.
> alignas(_S_align) __platform_wait_t _M_ver = 0;
>
> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
> __condvar _M_cv;
> #endif
>
> @@ -216,17 +209,19 @@ namespace
> __spin_impl(const __platform_wait_t* __addr, const __wait_args_base&
> __args)
> {
> __platform_wait_t __val{};
> + __wait_value_type wval;
> for (auto __i = 0; __i < __atomic_spin_count; ++__i)
> {
> __atomic_load(__addr, &__val, __args._M_order);
> - if (__val != __args._M_old)
> - return { ._M_val = __val, ._M_has_val = true, ._M_timeout =
> false };
> + wval = static_cast<__wait_value_type>(__val);
> + if (wval != __args._M_old)
> + return { ._M_val = wval, ._M_has_val = true, ._M_timeout = false
> };
> if (__i < __atomic_spin_count_relax)
> __thread_relax();
> else
> __thread_yield();
> }
> - return { ._M_val = __val, ._M_has_val = true, ._M_timeout = true };
> + return { ._M_val = wval, ._M_has_val = true, ._M_timeout = true };
> }
>
> inline __waitable_state*
> @@ -237,32 +232,58 @@ namespace
> return static_cast<__waitable_state*>(args._M_wait_state);
> }
>
> + [[gnu::always_inline]]
> + inline bool
> + use_proxy_wait(const __wait_args_base& args)
>
This needs to take the address that we are waiting on and check if it's
sufficiently aligned
to __platform_wait_aligment.
However, I do not think we should check unit64_t at all, as we do not have
such platform
wait yet, and we will not change __platform_wait_uses_type in that case.
>
> + {
> + if constexpr (__platform_wait_uses_type<uint32_t>)
> + if (args._M_obj_size == sizeof(uint32_t))
> + return false;
> +
> + if constexpr (__platform_wait_uses_type<uint64_t>)
> + if (args._M_obj_size == sizeof(uint64_t))
> + return false;
> +
> + // Use proxy wait for everything else:
> + return true;
> + }
> +
> } // namespace
>
> -// Called for a proxy wait
> +// Set _M_wait_state if using proxy wait, or caller wants contention
> tracking.
> +// Set _M_obj to &_M_wait_state->_M_ver if using proxy wait.
> +// Load the current value from _M_obj and store in _M_val.
> void
> -__wait_args::_M_load_proxy_wait_val(const void* addr)
> +__wait_args::_M_setup_wait_impl(const void* addr)
>
I would use _M_obj_size to indicate if proxy wait is used for the actual
await.
> {
> - // __glibcxx_assert( *this & __wait_flags::__proxy_wait );
> + if (!use_proxy_wait(*this))
Do; _M_obj_size > 0 && !use_proxy_wait(*this)
> + {
> + // We can wait on this address directly.
> + __glibcxx_assert(_M_obj == addr);
>
This funciton is one that is expected to set _M_obj, so it seems incorrect
to assert it here.
>
> - // We always need a waitable state for proxy waits.
> + int val;
> + __atomic_load(static_cast<const int*>(addr), &val,
> __ATOMIC_ACQUIRE);
> + _M_old = val;
> +
> + return;
> + }
> +
> + // This will be a proxy wait, so get a waitable state.
> auto state = set_wait_state(addr, *this);
>
> + // The address we will wait on is the version count of the waitable
> state:
> + _M_obj = &state->_M_ver;
>
+
> // Read the value of the _M_ver counter.
> - __atomic_load(&state->_M_ver, &_M_old, __ATOMIC_ACQUIRE);
> + __platform_wait_t __old;
> + __atomic_load(&state->_M_ver, &__old, __ATOMIC_ACQUIRE);
>
Nit pick, but I think this could be just:
__platform_wait_t __old = __atomic_load_n(&state->_M_ver,
__ATOMIC_ACQUIRE);
> + _M_old = __old;
>
Or just:
_M_old = __atomic_load_n(&state->_M_ver, __ATOMIC_ACQUIRE);
> }
>
> __wait_result_type
> -__wait_impl(const void* __addr, __wait_args_base& __args)
> +__wait_impl([[maybe_unused]] const void* __addr, __wait_args_base& __args)
> {
> - auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
> -
> - const __platform_wait_t* __wait_addr;
> -
> - if (__args & __wait_flags::__proxy_wait)
> - __wait_addr = &__state->_M_ver;
> - else
> - __wait_addr = static_cast<const __platform_wait_t*>(__addr);
> + auto* __wait_addr = static_cast<const
> __platform_wait_t*>(__args._M_obj);
>
> if (__args & __wait_flags::__do_spin)
> {
> @@ -286,6 +307,7 @@ __wait_impl(const void* __addr, __wait_args_base&
> __args)
> __atomic_load(__wait_addr, &__val, __args._M_order);
> if (__val == __args._M_old)
> {
> + auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
> __state->_M_cv.wait(__state->_M_mtx);
> return { ._M_val = __val, ._M_has_val = false, ._M_timeout = false
> };
> }
> @@ -294,24 +316,40 @@ __wait_impl(const void* __addr, __wait_args_base&
> __args)
> }
>
> void
> -__notify_impl(const void* __addr, [[maybe_unused]] bool __all,
> +__notify_impl([[maybe_unused]] const void* __addr, [[maybe_unused]] bool
> __all,
> const __wait_args_base& __args)
> {
> - auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
> - if (!__state)
> - __state = &__waitable_state::_S_state_for(__addr);
> + const bool __track_contention = __args &
> __wait_flags::__track_contention;
> + const bool proxy_wait = use_proxy_wait(__args);
>
And here we could just check if _M_obj_size == 0, as this would indication
of proxy wait.
>
> - [[maybe_unused]] const __platform_wait_t* __wait_addr;
> + [[maybe_unused]] auto* __wait_addr
> + = static_cast<const __platform_wait_t*>(__addr);
> +
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> + // Check whether it would be a non-proxy wait for this object.
> + // This condition must match the one in _M_setup_wait_impl to ensure
> that
> + // the address used for the notify matches the one used for the wait.
> + if (!proxy_wait)
> + {
> + if (__track_contention)
> + if (!__waitable_state::_S_state_for(__addr)._M_waiting())
> + return;
> +
> + __platform_notify(__wait_addr, __all);
> + return;
> + }
> +#endif
> +
> + // Either a proxy wait or we don't have platform wait/wake primitives.
> +
> + auto __state = &__waitable_state::_S_state_for(__addr);
>
> // Lock mutex so that proxied waiters cannot race with incrementing
> _M_ver
> // and see the old value, then sleep after the increment and
> notify_all().
> lock_guard __l{ *__state };
>
> - if (__args & __wait_flags::__proxy_wait)
> + if (proxy_wait)
> {
> - // Waiting for *__addr is actually done on the proxy's _M_ver.
> - __wait_addr = &__state->_M_ver;
> -
> // Increment _M_ver so that waiting threads see something changed.
> // This has to be atomic because the load in _M_load_proxy_wait_val
> // is done without the mutex locked.
> @@ -322,11 +360,11 @@ __notify_impl(const void* __addr, [[maybe_unused]]
> bool __all,
> // they can re-evaluate their conditions to see if they should
> // stop waiting or should wait again.
> __all = true;
> - }
> - else // Use the atomic variable's own address.
> - __wait_addr = static_cast<const __platform_wait_t*>(__addr);
>
> - if (__args & __wait_flags::__track_contention)
> + __wait_addr = &__state->_M_ver;
> + }
> +
> + if (__track_contention)
> {
> if (!__state->_M_waiting())
> return;
> @@ -366,7 +404,7 @@ __platform_wait_until(const __platform_wait_t* __addr,
> }
> #endif // HAVE_LINUX_FUTEX
>
> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
> bool
> __cond_wait_until(__condvar& __cv, mutex& __mx,
> const __wait_clock_t::time_point& __atime)
> @@ -381,7 +419,7 @@ __cond_wait_until(__condvar& __cv, mutex& __mx,
> __cv.wait_until(__mx, __ts);
> return __wait_clock_t::now() < __atime;
> }
> -#endif // ! HAVE_PLATFORM_TIMED_WAIT
> +#endif // ! HAVE_PLATFORM_WAIT
>
> // Unlike __spin_impl, does not always return _M_has_val == true.
> // If the deadline has already passed then no fresh value is loaded.
> @@ -414,7 +452,9 @@ __spin_until_impl(const __platform_wait_t* __addr,
> return __res;
> }
>
> - __atomic_load(__addr, &__res._M_val, __args._M_order);
> + __platform_wait_t val;
> + __atomic_load(__addr, &val, __args._M_order);
>
Same comment on __atomic_load_n here.
> + __res._M_val = val;
> __res._M_has_val = true;
> if (__res._M_val != __args._M_old)
> {
> @@ -428,16 +468,11 @@ __spin_until_impl(const __platform_wait_t* __addr,
> } // namespace
>
> __wait_result_type
> -__wait_until_impl(const void* __addr, __wait_args_base& __args,
> +__wait_until_impl([[maybe_unused]] const void* __addr, __wait_args_base&
> __args,
> const __wait_clock_t::duration& __time)
> {
> const __wait_clock_t::time_point __atime(__time);
> - auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
> - const __platform_wait_t* __wait_addr;
> - if (__args & __wait_flags::__proxy_wait)
> - __wait_addr = &__state->_M_ver;
> - else
> - __wait_addr = static_cast<const __platform_wait_t*>(__addr);
> + auto* __wait_addr = static_cast<const
> __platform_wait_t*>(__args._M_obj);
>
> if (__args & __wait_flags::__do_spin)
> {
> @@ -448,9 +483,9 @@ __wait_until_impl(const void* __addr,
> __wait_args_base& __args,
> return __res;
> }
>
> -#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
> if (__args & __wait_flags::__track_contention)
> - set_wait_state(__addr, __args);
> + set_wait_state(__addr, __args); // scoped_wait needs a
> __waitable_state
> scoped_wait s(__args);
> bool timeout = !__platform_wait_until(__wait_addr, __args._M_old,
> __atime);
> return { ._M_val = __args._M_old, ._M_has_val = false, ._M_timeout =
> timeout };
> @@ -460,6 +495,7 @@ __wait_until_impl(const void* __addr,
> __wait_args_base& __args,
> __atomic_load(__wait_addr, &__val, __args._M_order);
> if (__val == __args._M_old)
> {
> + auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
> bool timeout = !__cond_wait_until(__state->_M_cv, __state->_M_mtx,
> __atime);
> return { ._M_val = __val, ._M_has_val = false, ._M_timeout =
> timeout };
> }
> --
> 2.51.1
>
>