On Mar 3, 2023, Jonathan Wakely <jwak...@redhat.com> wrote: > On Fri, 3 Mar 2023 at 09:33, Jonathan Wakely <jwak...@redhat.com> wrote: >> Jakub previously suggested doing this for PR 61841, which was a similar >> problem with pthread_create: >> >> __asm ("" : : "r" (&pthread_create)); would not be optimized away. >> >> >> That would avoid the multiple copies.
Not really. There would be multiple copies of the code that loads pthread_create's address. And we don't really need the address, a single never-executed call would do. I've explored these possibilities a bit, and here's what I've come up with: a private static member function that we output in units that instantiate the thread template ctor, to pass its address to _M_start_thread. Since it's never actually called, we don't really need the hacks in some of the alternatives I left in place, mainly for your enjoyment. They all work equally well, just as efficient per-instantiation at runtime, a little different space and loading overheads, but the last one, that is enabled, is my favorite: only PLT relocations, that we'd likely get anyway, no full-address resolution, and as-short-as-possible calls, enough to get a relocation with a strong reference to pull the symbol in when linking, but as short as possible call sequences, because of the type cast. As a bonus, I put in (in the last minute, after my test runs) something to keep even LTO happy: the asm statements to prevent depend from being optimized out in _M_start_thread. In non-LTO, its impact should be virtually zero. How does this look? (minus the #if 0/#elif 0/.../#else) link pthread_join from std::thread ctor Like pthread_create, pthread_join may fail to be statically linked in absent strong uses, so add to user code strong references to both when std::thread objects are created. for libstdc++-v3/ChangeLog * include/bits/std_thread.h (thread::_M_thread_deps): New static inline function. (std::thread template ctor): Pass it to _M_start_thread. * src/c++11/thread.cc (thread::_M_start_thread): Name depend parameter, force it live on entry. --- libstdc++-v3/include/bits/std_thread.h | 51 ++++++++++++++++++++++++++++---- libstdc++-v3/src/c++11/thread.cc | 10 +++++- 2 files changed, 52 insertions(+), 9 deletions(-) diff --git a/libstdc++-v3/include/bits/std_thread.h b/libstdc++-v3/include/bits/std_thread.h index adbd3928ff783..3ffd2a823a698 100644 --- a/libstdc++-v3/include/bits/std_thread.h +++ b/libstdc++-v3/include/bits/std_thread.h @@ -132,6 +132,49 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION thread() noexcept = default; #ifdef _GLIBCXX_HAS_GTHREADS + private: + // This adds to user code that creates std:thread objects (because + // it is called by the template ctor below) strong references to + // pthread_create and pthread_join, which ensures they are both + // linked in even during static linking. We can't depend on + // gthread calls to bring them in, because those may use weak + // references. + static void + _M_thread_deps_never_run() { +#ifdef GTHR_ACTIVE_PROXY +#if 0 + static auto const __attribute__ ((__used__)) _M_create = pthread_create; + static auto const __attribute__ ((__used__)) _M_join = pthread_join; +#elif 0 + pthread_t thr; + pthread_create (&thr, nullptr, nullptr, nullptr); + pthread_join (thr, nullptr); +#elif 0 + asm goto ("" : : : : _M_never_run); + if (0) + { + _M_never_run: + pthread_t thr; + pthread_create (&thr, nullptr, nullptr, nullptr); + pthread_join (thr, nullptr); + } +#elif 0 + bool _M_skip_always = false; + asm ("" : "+rm" (_M_skip_always)); + if (__builtin_expect (_M_skip_always, false)) + { + pthread_t thr; + pthread_create (&thr, nullptr, nullptr, nullptr); + pthread_join (thr, nullptr); + } +#else + reinterpret_cast<void (*)(void)>(&pthread_create)(); + reinterpret_cast<void (*)(void)>(&pthread_join)(); +#endif +#endif + } + + public: template<typename _Callable, typename... _Args, typename = _Require<__not_same<_Callable>>> explicit @@ -142,18 +185,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION "std::thread arguments must be invocable after conversion to rvalues" ); -#ifdef GTHR_ACTIVE_PROXY - // Create a reference to pthread_create, not just the gthr weak symbol. - auto __depend = reinterpret_cast<void(*)()>(&pthread_create); -#else - auto __depend = nullptr; -#endif using _Wrapper = _Call_wrapper<_Callable, _Args...>; // Create a call wrapper with DECAY_COPY(__f) as its target object // and DECAY_COPY(__args)... as its bound argument entities. _M_start_thread(_State_ptr(new _State_impl<_Wrapper>( std::forward<_Callable>(__f), std::forward<_Args>(__args)...)), - __depend); + _M_thread_deps_never_run); } #endif // _GLIBCXX_HAS_GTHREADS diff --git a/libstdc++-v3/src/c++11/thread.cc b/libstdc++-v3/src/c++11/thread.cc index 2d5ffaf678e97..c91f7b02e1f3f 100644 --- a/libstdc++-v3/src/c++11/thread.cc +++ b/libstdc++-v3/src/c++11/thread.cc @@ -154,8 +154,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } void - thread::_M_start_thread(_State_ptr state, void (*)()) + thread::_M_start_thread(_State_ptr state, void (*depend)()) { + // Make sure it's not optimized out, not even with LTO. + asm ("" : : "rm" (depend)); + if (!__gthread_active_p()) { #if __cpp_exceptions @@ -190,8 +193,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } void - thread::_M_start_thread(__shared_base_type __b, void (*)()) + thread::_M_start_thread(__shared_base_type __b, void (*depend)()) { + // Make sure it's not optimized out, not even with LTO. + asm ("" : : "rm" (depend)); + auto ptr = __b.get(); // Create a reference cycle that will be broken in the new thread. ptr->_M_this_ptr = std::move(__b); -- Alexandre Oliva, happy hacker https://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>