Paolo Bonzini <pbonz...@redhat.com> writes: > ELF thread local storage is about 10% faster on tests/test-coroutine's > perf/cost test. The timing on my machine is 160ns per iteration with > pthread TLS, 145 with ELF TLS. > > Based on a patch by Kevin Wolf and Peter Lieven, but redone to follow > the model of coroutine-win32.c (including the important "noinline" > attribute!!!). > > Platforms without thread-local storage (OpenBSD probably?) will need > a new-enough GCC for this to compile, in order to use the same emutls > support that Windows already relies on. [...] > @@ -193,15 +155,22 @@ void qemu_coroutine_delete(Coroutine *co_) > g_free(co); > } > > +/* This function is marked noinline to prevent GCC from inlining it > + * into coroutine_trampoline(). If we allow it to do that then it > + * hoists the code to get the address of the TLS variable "current" > + * out of the while() loop. This is an invalid transformation because > + * the SwitchToFiber() call may be called when running thread A but > + * return in thread B, and so we might be in a different thread > + * context each time round the loop. > + */ > CoroutineAction qemu_coroutine_switch(Coroutine *from_, Coroutine *to_, > CoroutineAction action)
Err, did you forget the actual __attribute__((noinline))? > { > CoroutineUContext *from = DO_UPCAST(CoroutineUContext, base, from_); > CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, to_); > - CoroutineThreadState *s = coroutine_get_thread_state(); > int ret; > > - s->current = to_; > + current = to_; > > ret = sigsetjmp(from->env, 0); > if (ret == 0) { [...]