On Thu, Dec 02, 2021 at 02:44:42PM +0000, Peter Maydell wrote: > On Wed, 1 Dec 2021 at 17:19, Stefan Hajnoczi <stefa...@redhat.com> wrote: > > > > Compiler optimizations can cache TLS values across coroutine yield > > points, resulting in stale values from the previous thread when a > > coroutine is re-entered by a new thread. > > > > Serge Guelton developed an __attribute__((noinline)) wrapper and tested > > it with clang and gcc. I formatted his idea according to QEMU's coding > > style and wrote documentation. > > > +#ifdef QEMU_CO_TLS_ADDR > > +#define QEMU_DEFINE_STATIC_CO_TLS(type, var) \ > > + __thread type co_tls_##var; \ > > + static inline type get_##var(void) \ > > + { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return *p; } \ > > + static inline void set_##var(type v) \ > > + { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); *p = v; } \ > > + static inline type *get_ptr_##var(void) \ > > + { type *p; QEMU_CO_TLS_ADDR(p, co_tls_##var); return p; } > > +#else > > +#define QEMU_DEFINE_STATIC_CO_TLS(type, var) \ > > + static __thread type co_tls_##var; \ > > + static __attribute__((noinline, unused)) type get_##var(void) \ > > + { return co_tls_##var; } \ > > + static __attribute__((noinline, unused)) void set_##var(type v) \ > > + { co_tls_##var = v; } \ > > + static __attribute__((noinline, unused)) type *get_ptr_##var(void) \ > > + { return &co_tls_##var; } > > +#endif > > My compiler-developer colleagues present the following case where > 'noinline' is not sufficient for the compiler to definitely > use different values of the address-of-the-TLS-var across an > intervening function call: > > __thread int i; > > __attribute__((noinline)) long get_ptr_i() > { > return (long)&i; > } > > void switcher(); > > int g() > { > long a = get_ptr_i(); > switcher(); > return a == get_ptr_i(); > }
You can also force an extra mov through `volatile` as in https://godbolt.org/z/hWvdb7o9G