On 3/11/22 10:27, Stefan Hajnoczi wrote:
Not quite voluntarily, but I noticed I had to add one 0 to make them run for
a decent amount of time.  So yeah, it's much faster than siglongjmp.
That's a nice first indication that performance will be good. I guess
that deep coroutine_fn stacks could be less efficient with stackless
coroutines compared to ucontext, but the cost of switching between
coroutines (enter/yield) will be lower with stackless coroutines.
Note that right now I'm not placing the coroutine_fn stack on the heap, 
it's still allocated from a contiguous area in virtual address space. 
The contiguous allocation is wrapped by coroutine_stack_alloc and 
coroutine_stack_free, so it's really easy to change them to malloc and free.
I also do not have to walk up the whole call stack on coroutine_fn 
yields, because calls from one coroutine_fn to the next are tail calls; 
in exchange for that, I have more indirect calls than if the code did
    if (next_call() == COROUTINE_YIELD) {
        return COROUTINE_YIELD;
    }

For now the choice was again just the one that made the translation easiest.

Today I also managed to implement a QEMU-like API on top of C++ coroutines:

    CoroutineFn<int> return_int() {
        co_await qemu_coroutine_yield();
        co_return 30;
    }

    CoroutineFn<void> return_void() {
        co_await qemu_coroutine_yield();
    }

    CoroutineFn<void> co(void *) {
        co_await return_void();
        printf("%d\n", co_await return_int())
        co_await qemu_coroutine_yield();
    }

    int main() {
        Coroutine *f = qemu_coroutine_create(co, NULL);
        printf("--- 0\n");
        qemu_coroutine_enter(f);
        printf("--- 1\n");
        qemu_coroutine_enter(f);
        printf("--- 2\n");
        qemu_coroutine_enter(f);
        printf("--- 3\n");
        qemu_coroutine_enter(f);
        printf("--- 4\n");
    }

The runtime code is absurdly obscure; my favorite bit is

    Yield qemu_coroutine_yield()
    {
        return Yield();
    }

:) However, at 200 lines of code it's certainly smaller than a source-to-source translator. It might be worth investigating a bit more. Only files that define or use a coroutine_fn (which includes callers of qemu_coroutine_create) would have to be compiled as C++.
Paolo

Reply via email to