As discussed in the other thread, this brings speedups from dropping the coroutine mutex (which serializes multiple iothreads, too) and using ELF thread-local storage.
The speedup in perf/cost is about 30% (190->145). Windows port tested with tests/test-coroutine.exe under Wine. Paolo Paolo Bonzini (7): coroutine-ucontext: use __thread qemu-thread: add per-thread atexit functions test-coroutine: avoid overflow on 32-bit systems QSLIST: add lock-free operations coroutine: rewrite pool to avoid mutex coroutine: drop qemu_coroutine_adjust_pool_size coroutine: try harder not to delete coroutines block/block-backend.c | 4 -- coroutine-ucontext.c | 64 +++++++--------------------- include/block/coroutine.h | 10 ----- include/qemu/queue.h | 15 ++++++- include/qemu/thread.h | 4 ++ qemu-coroutine.c | 104 ++++++++++++++++++++++------------------------ tests/test-coroutine.c | 2 +- util/qemu-thread-posix.c | 37 +++++++++++++++++ util/qemu-thread-win32.c | 48 ++++++++++++++++----- 9 files changed, 157 insertions(+), 131 deletions(-) -- 2.1.0