On Wed, 1 Dec 2021 at 17:19, Stefan Hajnoczi <stefa...@redhat.com> wrote: > > Compiler optimizations can cache TLS values across coroutine yield > points, resulting in stale values from the previous thread when a > coroutine is re-entered by a new thread. > > Serge Guelton developed an __attribute__((noinline)) wrapper and tested > it with clang and gcc. I formatted his idea according to QEMU's coding > style and wrote documentation.
> +#ifdef QEMU_CO_TLS_ADDR > +#define QEMU_DEFINE_STATIC_CO_TLS(type, var) \ > + __thread type co_tls_##var; \ > + static inline type get_##var(void) \ > + { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return *p; } \ > + static inline void set_##var(type v) \ > + { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); *p = v; } \ > + static inline type *get_ptr_##var(void) \ > + { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return p; } > +#else > +#define QEMU_DEFINE_STATIC_CO_TLS(type, var) \ > + static __thread type co_tls_##var; \ > + static __attribute__((noinline, unused)) type get_##var(void) \ > + { return co_tls_##var; } \ > + static __attribute__((noinline, unused)) void set_##var(type v) \ > + { co_tls_##var = v; } \ > + static __attribute__((noinline, unused)) type *get_ptr_##var(void) \ > + { return &co_tls_##var; } > +#endif My compiler-developer colleagues present the following case where 'noinline' is not sufficient for the compiler to definitely use different values of the address-of-the-TLS-var across an intervening function call: __thread int i; __attribute__((noinline)) long get_ptr_i() { return (long)&i; } void switcher(); int g() { long a = get_ptr_i(); switcher(); return a == get_ptr_i(); } Trunk clang optimizes the comparison of the two pointers down to "must be always true" even though they might not be if the switcher() function has resulted in a change-of-thread: https://godbolt.org/z/hd67zh6jW The 'optnone' attribute (clang-specific) seems to be able to suppress this attribute. The equivalent on gcc may (or may not) be 'noipa'. -- PMM