This patch series switches virtio-blk data-plane from a custom Linux AIO request queue to the QEMU block layer. The previous "raw files only" limitation is lifted. All image formats and protocols can now be used with virtio-blk data-plane.
How to review this series ------------------------- I CCed the maintainer of each block driver that I modified. You probably don't need to review the entire series, just your patch. >From now on fd handlers, timers, BHs, and event loop wait must explicitly use BlockDriverState's AioContext instead of the main loop. Use bdrv_get_aio_context(bs) to get the AioContext. The following function calls need to be converted: * qemu_aio_set_fd_handler() -> aio_set_fd_handler() * timer_new*() -> aio_timer_new() * qemu_bh_new() -> aio_bh_new() * qemu_aio_wait() -> aio_poll(aio_context, true) For simple block drivers this modification suffices and it is now safe to use outside the QEMU global mutex. Block drivers that keep fd handlers, timers, or BHs registered when requests have been drained need a little bit more work. Examples of this are network block drivers with keepalive timers, like iSCSI. This series adds a new bdrv_set_aio_context(bs, aio_context) function that moves a BlockDriverState into a new AioContext. This function calls the block driver's optional .bdrv_detach_aio_context() and .bdrv_attach_aio_context() functions. Implement detach/attach to move the fd handlers, timers, or BHs to the new AioContext. Finally, block drivers that manage their own child nodes also need to implement detach/attach because the generic block layer doesn't know about their children. Both ->file and ->backing_hd are automatically taken care of but blkverify, quorum, and VMDK need to manually propagate detach/attach to their children. I have audited and modified all block drivers. Block driver maintainers, please check I did it correctly and didn't break your code. Background ---------- The block layer is currently tied to the QEMU main loop for fd handlers, timer callbacks, and BHs. This means that even on hosts with many cores, parts of block I/O processing happen in one thread and depend on the QEMU global mutex. virtio-blk data-plane has shown that 1,000,000 IOPS is achievable if we use additional threads that are not under the QEMU global mutex. It is necessary to make the QEMU block layer aware that there may be more than one event loop. This way BlockDriverState can be used from a thread without contention on the QEMU global mutex. This series builds on the aio_context_acquire/release() interface that allows a thread to temporarily grab an AioContext. We add bdrv_set_aio_context(bs, aio_context) for changing which AioContext a BlockDriverState uses. The final patches convert virtio-blk data-plane to use the QEMU block layer and let the BlockDriverState run in the IOThread AioContext. What's next? ------------ I have already made block I/O throttling work in another AioContext and will send the series out next week. In order to keep this series reviewable, I'm holding back those patches for now. One could say, "throttling" them. Thank you, thank you, I'll be here all night! Stefan Hajnoczi (22): block: use BlockDriverState AioContext block: acquire AioContext in bdrv_close_all() block: add bdrv_set_aio_context() blkdebug: use BlockDriverState's AioContext blkverify: implement .bdrv_detach/attach_aio_context() curl: implement .bdrv_detach/attach_aio_context() gluster: use BlockDriverState's AioContext iscsi: implement .bdrv_detach/attach_aio_context() nbd: implement .bdrv_detach/attach_aio_context() nfs: implement .bdrv_detach/attach_aio_context() qed: use BlockDriverState's AioContext quorum: implement .bdrv_detach/attach_aio_context() block/raw-posix: implement .bdrv_detach/attach_aio_context() block/linux-aio: fix memory and fd leak rbd: use BlockDriverState's AioContext sheepdog: implement .bdrv_detach/attach_aio_context() ssh: use BlockDriverState's AioContext vmdk: implement .bdrv_detach/attach_aio_context() dataplane: use the QEMU block layer for I/O dataplane: delete IOQueue since it is no longer used dataplane: implement async flush raw-posix: drop raw_get_aio_fd() since it is no longer used block.c | 88 +++++++++++++-- block/blkdebug.c | 2 +- block/blkverify.c | 47 +++++--- block/curl.c | 194 +++++++++++++++++++------------- block/gluster.c | 7 +- block/iscsi.c | 79 +++++++++---- block/linux-aio.c | 24 +++- block/nbd-client.c | 24 +++- block/nbd-client.h | 4 + block/nbd.c | 87 +++++++++------ block/nfs.c | 80 ++++++++++---- block/qed-table.c | 8 +- block/qed.c | 35 +++++- block/quorum.c | 48 ++++++-- block/raw-aio.h | 3 + block/raw-posix.c | 82 ++++++++------ block/rbd.c | 5 +- block/sheepdog.c | 118 +++++++++++++------- block/ssh.c | 36 +++--- block/vmdk.c | 23 ++++ hw/block/dataplane/Makefile.objs | 2 +- hw/block/dataplane/ioq.c | 117 -------------------- hw/block/dataplane/ioq.h | 57 ---------- hw/block/dataplane/virtio-blk.c | 233 +++++++++++++++------------------------ include/block/block.h | 20 ++-- include/block/block_int.h | 36 ++++++ 26 files changed, 829 insertions(+), 630 deletions(-) delete mode 100644 hw/block/dataplane/ioq.c delete mode 100644 hw/block/dataplane/ioq.h -- 1.9.0