Hi, These patches bring up below 4 changes:
- introduce selective coroutine bypass mechanism for improving performance of virtio-blk dataplane with raw format image - introduce object allocation pool and apply it to virtio-blk dataplane for improving its performance - linux-aio changes: fixing for cases of -EAGAIN and partial completion, increase max events to 256, and remove one unuseful fields in 'struct qemu_laiocb' - support multi virtqueue for virtio-blk dataplane The virtio-blk multi virtqueue feature will be added to virtio spec 1.1[1], and the 3.17 linux kernel[2] will support the feature in virtio-blk driver. For those who wants to play the stuff, the kernel side patche can be found in either Jens's block tree[3] or linux-next[4]. Below fio script running from VM is used for test improvement of these patches: [global] direct=1 size=128G bsrange=4k-4k timeout=120 numjobs=${JOBS} ioengine=libaio iodepth=64 filename=/dev/vdc group_reporting=1 [f] rw=randread One quadcore VM(8G RAM) is created in below host to run above fio test: - server(16cores: 8 physical cores, 2 threads per physical core) Follows the test result on throughput improvement(IOPS) with this patchset(4 virtqueues per virito-blk device) against QEMU 2.1.0-rc5: 30% throughput improvement can be observed, and scalability for parallel I/Os is improved more(80% throughput improvement is observed in case of 4 JOBS). >From above result, we can see both scalability and performance get improved a lot. After commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O), average time for submiting one single request has been increased a lot, as my trace, the average time taken for submiting one request has been doubled even though block plug&unplug mechanism is introduced to ease its effect. That is why this patchset introduces selective coroutine bypass mechanism and object allocation pool for saving the time first. Based on QEMU 2.0, only single virtio-blk dataplane multi virtqueue patch can get better improvement than current result[5]. TODO: - optimize block layer for linux aio, so that more time can be saved for submitting request - support more than one aio-context for improving virtio-blk performance async.c | 1 + block.c | 129 ++++++++++++++++++----- block/linux-aio.c | 93 +++++++++++----- block/raw-posix.c | 34 ++++++ hw/block/dataplane/virtio-blk.c | 221 ++++++++++++++++++++++++++++++--------- hw/block/virtio-blk.c | 32 +++++- hw/net/virtio-net.c | 4 +- hw/virtio/dataplane/vring.c | 23 +++- hw/virtio/virtio.c | 23 ++-- include/block/aio.h | 13 +++ include/block/block.h | 9 ++ include/block/coroutine.h | 8 ++ include/block/coroutine_int.h | 5 + include/hw/virtio/virtio-blk.h | 13 +++ include/hw/virtio/virtio.h | 13 ++- include/qemu/obj_pool.h | 64 ++++++++++++ qemu-coroutine-lock.c | 4 +- qemu-coroutine.c | 33 ++++++ 18 files changed, 600 insertions(+), 122 deletions(-) [1], http://marc.info/?l=linux-api&m=140486843317107&w=2 [2], http://marc.info/?l=linux-api&m=140418368421229&w=2 [3], http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/ #for-3.17/drivers [4], https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/ [5], http://marc.info/?l=linux-api&m=140377573830230&w=2 Thanks,