This patch series prototypes making QCOW2 fully asynchronous to eliminate the timing jitter and poor performance that has been observed. QCOW2 has asynchronous I/O code paths for some of the read/write common cases but metadata access is always synchronous.
One solution is to rewrite QCOW2 to be fully asynchronous by splitting all functions that perform blocking I/O into a series of callbacks. Due to the complexity of QCOW2, this conversion and the maintenance prospects are unattractive. This patch series prototypes an alternative solution to make QCOW2 asynchronous. It introduces coroutines, cooperative userspace threads of control, so that each QCOW2 request has its own call stack. To perform I/O, the coroutine submits an asynchronous I/O request and then yields back to QEMU. The coroutine stays suspended while the I/O operation is being processed by lower layers of the stack. When the asynchronous I/O completes, the coroutine is resumed. The upshot of this is that QCOW2 can be implemented in a sequential fashion without explicit callbacks but all I/O actually happens asynchronously under the covers. This prototype implements reads, writes, and flushes. Should install or boot VMs successfully. However, it has the following limitations: 1. QCOW2 requests are serialized because the code is not yet safe for concurrent requests. See the last patch for details. 2. Coroutines are unoptimized. We should pool coroutines (and their mmapped stacks) to avoid the cost of coroutine creation. 3. The qcow2_aio_read_cb() and qcow2_aoi_write_cb() functions should be refactored into sequential code now that callbacks are no longer needed. I think this approach can solve the performance and functional problems of the current QCOW2 implementation. It does not require invasive changes, much of QCOW2 works unmodified. Kevin: Do you like this approach and do you want to develop it further? Stefan