It turns out that going from a prototype C++ implementation of the QEMU API, to something that could build tests/unit/test-coroutine, was just a few hours work; and once it compiled, only one line had to be changed for every test to pass.
Most of the differences between C and C++ already show up here: - keywords such as "new" (or "class", which I didn't encounter yet) - _Generic must be replaced by templates and/or overloading (QemuCoLockable is implemented completely different from QemuLockable, in fact I spent most of the time on that) - PRI* functions must be separated with a space from string constants that precede it - void* casts must be explicit (g_new takes care of that most of the time, but not for opaque pointers passed to coroutine). There are 300 lines of hard-core C++ in the backend and in coroutine.h. I tried to comment it as much as possible (this time I didn't include a big commit message on stackless coroutines in general) but it still requires some knowledge of the basic C++ coroutine concepts of resumable types, promise types and awaiter types. https://www.youtube.com/watch?v=ZTqHjjm86Bw is an excellent introduction and it's where I learnt most of what was needed. However, there are no ramifications to actual coroutine code, except for the template syntax "CoroutineFn<return_type>" for the function and the mandatory co_await/co_return keywords... both of which are an improvement, really: the fact that a single function cannot run either inside or outside coroutines is checked by the compiler now, because qemu_coroutine_create accepts a function that returns CoroutineFn<void>. Therefore I had to disable some more code in util/ and qapi/ that used qemu_in_coroutine() or coroutine_fn. Here is the performance comparison of the three backends: ucontext stackless C stackless C++ /perf/lifecycle 0.068 s 0.025 s 0.065 s /perf/nesting 55 s 4.7 s 1.7 s /perf/yield 6.0 s 1.3 s 1.3 s /perf/cost 8 Mops/s (125ns) 35 ns 10000 Mops/s (99 ns) One important difference is that C++ coroutines allocate frames on the heap, and that explains why performance is better in /perf/nesting, which has to do many large memory allocations for the stack in the other two backends (and also a makecontext/swapcontext in the ucontext case). C++ coroutines hardly benefit from the coroutine pool; OTOH that also means the coroutine pool could be removed if we went this way. I haven't checked why /perf/lifecycle (and therefore /perf/cost; they are roughly the same test) is so much slower than the handwritten C code. It's still comparable with the ucontext backend though. Overall this was ~twice the amount of work of the C experiment, but that's because the two are very different ways to achieve the same goal: - the design work was substantially smaller in the C experiment, where all the backend does is allocate stack frames and do a loop that invokes a function pointer. Here the backend has to map between the C++ concepts and the QEMU API. In the C case, most of the work was really in the manual conversion which I had to do one function at a time. - the remaining work is also completely different: a source-to-source translator (and only build system work in QEMU) for the C experiment; making ~100 files compile in C++ for this one (and relatively little work as far as coroutines are concerned). This was compiled with GCC 11 only. Coroutine support was added in GCC 10, released in 2020, which IIRC is much newer than the most recent release we support. Paolo Paolo Bonzini (17): coroutine: add missing coroutine_fn annotations for CoRwlock functions coroutine: qemu_coroutine_get_aio_context is not a coroutine_fn coroutine: small code cleanup in qemu_co_rwlock_wrlock coroutine: introduce QemuCoLockable port atomic.h to C++ use g_new0 instead of g_malloc0 start porting compiler.h to C++ tracetool: add extern "C" around generated headers start adding extern "C" markers add space between liter and string macro bump to C++20 remove "new" keyword from trace-events disable some code util: introduce C++ stackless coroutine backend port QemuCoLockable to C++ coroutines port test-coroutine to C++ coroutines configure | 48 +- include/block/aio.h | 5 + include/fpu/softfloat-types.h | 4 + include/qemu/atomic.h | 5 + include/qemu/bitops.h | 3 + include/qemu/bswap.h | 10 +- include/qemu/co-lockable.h | 93 ++++ include/qemu/compiler.h | 4 + include/qemu/coroutine.h | 466 +++++++++++++----- include/qemu/coroutine_int.h | 8 + include/qemu/host-utils.h | 4 + include/qemu/lockable.h | 13 +- include/qemu/notify.h | 4 + include/qemu/osdep.h | 1 + include/qemu/qsp.h | 4 + include/qemu/thread.h | 4 + include/qemu/timer.h | 6 +- include/qemu/typedefs.h | 1 + meson.build | 2 +- qapi/qmp-dispatch.c | 2 + scripts/tracetool/format/h.py | 8 +- tests/unit/meson.build | 8 +- .../{test-coroutine.c => test-coroutine.cc} | 138 +++--- util/async.c | 2 + util/coroutine-stackless.cc | 145 ++++++ util/meson.build | 14 +- ...oroutine-lock.c => qemu-coroutine-lock.cc} | 78 +-- ...outine-sleep.c => qemu-coroutine-sleep.cc} | 10 +- util/{qemu-coroutine.c => qemu-coroutine.cc} | 18 +- util/thread-pool.c | 2 + util/trace-events | 40 +- 31 files changed, 805 insertions(+), 345 deletions(-) create mode 100644 include/qemu/co-lockable.h rename tests/unit/{test-coroutine.c => test-coroutine.cc} (81%) create mode 100644 util/coroutine-stackless.cc rename util/{qemu-coroutine-lock.c => qemu-coroutine-lock.cc} (86%) rename util/{qemu-coroutine-sleep.c => qemu-coroutine-sleep.cc} (89%) rename util/{qemu-coroutine.c => qemu-coroutine.cc} (93%) -- 2.35.1