Hi Florian, This also affects fibres implementations (both C++ and D ones at least from discussion with both communities).
> On 20 Jul 2021, at 15:52, Florian Weimer via Gcc <gcc@gcc.gnu.org> wrote: > > Currently, the GNU/Linux ABI does not really specify whether the thread > pointer (the address of the TCB) may change at a function boundary. > > Traditionally, GCC assumes that the ABI allows caching addresses of > thread-local variables across function calls. Such caching varies in > aggressiveness between targets, probably due to differences in the > choice of -mtls-dialect=gnu and -mtls-dialect=gnu2 as the default for > the targets. (Caching with -mtls-dialect=gnu2 appears to be more > aggressive.) > > In addition to that, glibc defines errno as this: > > extern int *__errno_location (void) __attribute__ ((__const__)); > #define errno (*__errno_location ()) > > And the const attribute has the side effect of caching the address of > errno within the same stack frame. > > With stackful coroutines, such address caching is only valid if > coroutines are only ever resumed on the same thread on which they were > suspended. (The C++ coroutine implementation is not stackful and is not > affected by this at the ABI level.) There are C++20 coroutine library writers who want to switch threads in symmetric transfers [ I am not entirely convinced about this at present and it certainly would be suspect with TLS address caching enabled - since a TLS pointer could equally be cached in the coroutine frame ]. The C++20 coroutine ABI is silent on such matters (it only describes the visible part of the coroutine frame and the builtins used by the std library). > Historically, I think we took the > position that cross-thread resumption is undefined. But the ABIs aren't > crystal-clear on this matter. > One important piece of software for GNU is QEMU (not just for GNU/Linux, > Hurd development also benefits from virtualization). QEMU uses stackful > coroutines extensively. There are some hard-to-change code areas where > resumption happens across threads unfortunately. These increasingly > cause problems with more inlining, inter-procedural analysis, and a > general push towards LTO (which is also needed for some security > hardening features). > > Should the GNU toolchain offer something to help out the QEMU > developers? Maybe GCC could offer an option to disable the caching for > all TLS models. glibc could detect that mode based on a new > preprocessor macro and adjust its __errno_location declaration and > similar function declarations. There will be a performance impact of > this, of course, but it would make the QEMU usage well-defined (at the > lowest levels). > > If this is a programming model that should be supported, then restoring > some of the optimizations would be possible, by annotating > context-switching functions and TLS-address-dependent functions. But I > think QEMU would immediately benefit from just the simple approach that > disables address caching of TLS variables. IMO the general cases you note above are enough reason to want some mechanism to control this, thanks Iain > > Thanks, > Florian >