Hi, Stefan. On Mon, Jun 20, 2016 at 12:36 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > On Tue, Jun 07, 2016 at 05:28:24PM +0100, Stefan Hajnoczi wrote: >> v3: >> * Drop Patch 1 to batch guest notify for non-dataplane >> >> The Linux AIO completion BH and the virtio-blk batch notify BH changed >> order >> in the AioContext->first_bh list as a side-effect of moving the BH from >> hw/block/dataplane/virtio-blk.c to hw/block/virtio-blk.c. This caused a >> serious performance regression for both dataplane and non-dataplane. >> >> I've decided not to move the BH in this series and work on a separate >> solution for making batch notify generic. >> >> The remaining patches have been reordered and cleaned up. >> >> * See performance data below. >> >> v2: >> * Simplify s->rq live migration [Paolo] >> * Use more efficient bitmap ops for batch notification [Paolo] >> * Fix perf regression due to batch notify BH in wrong AioContext [Christian] >> >> The virtio_blk guest driver has supported multiple virtqueues since Linux >> 3.17. >> This patch series adds multiple virtqueues to QEMU's virtio-blk emulated >> device. >> >> Ming Lei sent patches previously but these were not merged. This series >> implements virtio-blk multiqueue for QEMU from scratch since the codebase has >> changed. Live migration support for s->rq was also missing from the previous >> series and has been added. >> >> It's important to note that QEMU's block layer does not support multiqueue >> yet. >> Therefore virtio-blk device processes all virtqueues in the same AioContext >> (IOThread). Further work is necessary to take advantage of multiqueue >> support >> in QEMU's block layer once it becomes available. >> >> Performance results: >> >> Using virtio-blk-pci,num-queues=4 can produce a speed-up but -smp 4 >> introduces a lot of variance across runs. No pinning was performed. >> >> Results show that there is no regression anymore, thanks to dropping the >> batch notify BH patch. >> >> RHEL 7.2 guest on RHEL 7.2 host with 1 vcpu and 1 GB RAM unless otherwise >> noted. The default configuration of the Linux null_blk driver is used as >> /dev/vdb. >> >> $ cat files/fio.job >> [global] >> filename=/dev/vdb >> ioengine=libaio >> direct=1 >> runtime=60 >> ramp_time=5 >> gtod_reduce=1 >> >> [job1] >> numjobs=4 >> iodepth=16 >> rw=randread >> bs=4K >> >> $ ./analyze.py runs/ >> Name IOPS Error >> unpatched-d6550e9ed2 19269820.2 ± 1.36% >> unpatched-dataplane-d6550e9ed2 22351400.4 ± 1.07% >> v3-dataplane 22318511.2 ± 0.77% >> v3-no-dataplane 18936103.8 ± 1.12% >> v3-queues-4-no-dataplane 19177021.8 ± 1.45% >> v3-smp-4-no-dataplane 25509585.2 ± 29.50% >> v3-smp-4-no-dataplane-no-mq 12466177.2 ± 7.88% >> >> Configuration: >> Name Patched? Dataplane? SMP? MQ? >> unpatched-d6550e9ed2 N N N N >> unpatched-dataplane-d6550e9ed2 N Y N N >> v3-dataplane Y Y N N >> v3-no-dataplane Y N N N >> v3-queues-4-no-dataplane Y N N Y >> v3-smp-4-no-dataplane Y N Y Y >> v3-smp-4-no-dataplane-no-mq Y N Y N >> >> SMP means -smp 4. >> MQ means virtio-blk-pci,num-queues=4. >> >> Stefan Hajnoczi (7): >> virtio-blk: add VirtIOBlockConf->num_queues >> virtio-blk: multiqueue batch notify >> virtio-blk: tell dataplane which vq to notify >> virtio-blk: associate request with a virtqueue >> virtio-blk: live migrate s->rq with multiqueue >> virtio-blk: dataplane multiqueue support >> virtio-blk: add num-queues device property >> >> hw/block/dataplane/virtio-blk.c | 81 >> +++++++++++++++++++++++++++++------------ >> hw/block/dataplane/virtio-blk.h | 2 +- >> hw/block/virtio-blk.c | 52 +++++++++++++++++++++----- >> include/hw/virtio/virtio-blk.h | 6 ++- >> 4 files changed, 105 insertions(+), 36 deletions(-) > > Ping?
I have one minor note regarding the following test: "Name Patched? Dataplane? SMP? MQ?" "v3-queues-4-no-dataplane Y N N Y" If I am not mistaken and understand your test description, it does not make a lot sense to use queues when you have VCPUs=1 (i.e. SMP=N), because block layer will not create nr_queues > CPUs number. So what I want to say is that in this test even you specify num_queues=4 for virtio_blk, only 1 software queue will be created by the guest block layer and then it will be mapped to only 1 HW queue, even you have requested 4. On host userspace side you will always receive IO in the queue #0. And I've rebased and tested all my changes on top of your latest v3 set. The question is: are you interested in this up-to-date RFC of mine: "simple multithreaded MQ implementation for bdrv_raw" which I sent couple of weeks ago? I can resend it by single merged commit once more. -- Roman