Venkateswararao Jujjuri wrote: > This model makes the code simple and also in one shot we can convert > all v9fs_do_syscalls into asynchronous threads. But as Aneesh raised > will there be any additional overhead for the additional jumps? We > can quickly test it out too.
I'm not sure if this is exactly the right place (I haven't followed the whole discussion), but there is a useful trick for getting rid of one of the thread context switches: Swizzle *which* thread is your "main" coroutine thread. Instead of queuing up an item on the work queue, waking the worker thread pool, and having a worker thread pick up the coroutine, you: Declare the current thread to *be* a worker through from this point, and queue the calling context for a worker thread to pick up. When it picks it up, *that* thread declares itself to be the main thread coroutine thread. So the coroutine entry step is just queuing a context for another thread to pick up, and then diving into the blocking system call (optimising out the enqueue/dequeue and thread switch). In a sense, you make the "main" thread a long-lived work queue entry, and have a symmetric pool, except that the main thread tends to behave differently than the other work items. This only works if the main thread's state is able to follow the swizzling. I don't know if KVM VCPUs will do that, for example, or if there's other per-thread state that won't work. If the main thread can't be swizzled, you can still use this trick when doing the coroutine->syscall step starting form an existing worker thread. -- Jamie