Re: [Qemu-devel] [PATCH 1/7] coroutine-ucontext: use __thread

Markus Armbruster Fri, 28 Nov 2014 06:47:24 -0800

Paolo Bonzini <pbonz...@redhat.com> writes:

> ELF thread local storage is about 10% faster on tests/test-coroutine's
> perf/cost test.  The timing on my machine is 160ns per iteration with
> pthread TLS, 145 with ELF TLS.
>
> Based on a patch by Kevin Wolf and Peter Lieven, but redone to follow
> the model of coroutine-win32.c (including the important "noinline"
> attribute!!!).
>
> Platforms without thread-local storage (OpenBSD probably?) will need
> a new-enough GCC for this to compile, in order to use the same emutls
> support that Windows already relies on.
[...]
> @@ -193,15 +155,22 @@ void qemu_coroutine_delete(Coroutine *co_)
>      g_free(co);
>  }
>  
> +/* This function is marked noinline to prevent GCC from inlining it
> + * into coroutine_trampoline(). If we allow it to do that then it
> + * hoists the code to get the address of the TLS variable "current"
> + * out of the while() loop. This is an invalid transformation because
> + * the SwitchToFiber() call may be called when running thread A but
> + * return in thread B, and so we might be in a different thread
> + * context each time round the loop.
> + */
>  CoroutineAction qemu_coroutine_switch(Coroutine *from_, Coroutine *to_,
>                                        CoroutineAction action)


Err, did you forget the actual __attribute__((noinline))?

>  {
>      CoroutineUContext *from = DO_UPCAST(CoroutineUContext, base, from_);
>      CoroutineUContext *to = DO_UPCAST(CoroutineUContext, base, to_);
> -    CoroutineThreadState *s = coroutine_get_thread_state();
>      int ret;
>  
> -    s->current = to_;
> +    current = to_;
>  
>      ret = sigsetjmp(from->env, 0);
>      if (ret == 0) {
[...]

Re: [Qemu-devel] [PATCH 1/7] coroutine-ucontext: use __thread

Reply via email to