On Thu, Jun 02, 2016 at 05:19:41PM -0700, Stefan Hajnoczi wrote: > On Mon, May 30, 2016 at 06:25:58PM -0700, Stefan Hajnoczi wrote: > > v2: > > * Simplify s->rq live migration [Paolo] > > * Use more efficient bitmap ops for batch notification [Paolo] > > * Fix perf regression due to batch notify BH in wrong AioContext > > [Christian] > > > > The virtio_blk guest driver has supported multiple virtqueues since Linux > > 3.17. > > This patch series adds multiple virtqueues to QEMU's virtio-blk emulated > > device. > > > > Ming Lei sent patches previously but these were not merged. This series > > implements virtio-blk multiqueue for QEMU from scratch since the codebase > > has > > changed. Live migration support for s->rq was also missing from the > > previous > > series and has been added. > > > > It's important to note that QEMU's block layer does not support multiqueue > > yet. > > Therefore virtio-blk device processes all virtqueues in the same AioContext > > (IOThread). Further work is necessary to take advantage of multiqueue > > support > > in QEMU's block layer once it becomes available. > > > > I will post performance results once they are ready. > > > > Stefan Hajnoczi (8): > > virtio-blk: use batch notify in non-dataplane case > > virtio-blk: tell dataplane which vq to notify > > virtio-blk: associate request with a virtqueue > > virtio-blk: add VirtIOBlockConf->num_queues > > virtio-blk: multiqueue batch notify > > virtio-blk: live migrateion s->rq with multiqueue > > virtio-blk: dataplane multiqueue support > > virtio-blk: add num-queues device property > > > > hw/block/dataplane/virtio-blk.c | 68 +++++++++++---------- > > hw/block/dataplane/virtio-blk.h | 2 +- > > hw/block/virtio-blk.c | 129 > > +++++++++++++++++++++++++++++++++++----- > > include/hw/virtio/virtio-blk.h | 8 ++- > > 4 files changed, 159 insertions(+), 48 deletions(-) > > There is a significant performance regression due to batch notify: > > $ ./analyze.py runs/ > Name IOPS Error > unpatched-d6550e9ed2 19269820.2 ± 1.36% > unpatched-d6550e9ed2-2 19567358.4 ± 2.42% > v2-batch-only-f27ed9a4d9 16252227.2 ± 6.09% > v2-no-dataplane 14560225.4 ± 5.16% > v2-no-dataplane-2 14622535.6 ± 10.08% > v2-no-dataplane-3 13960670.8 ± 7.11% > > unpatched-d6550e9ed2 is without this patch series. > v2-batch-only-f27ed9a4d9 is with Patch 1 only. v2-no-dataplane is with > the patch series (dataplane is not enabled in any of these tests). > > Next I will compare unpatched dataplane against patched dataplane. I > want to make sure Patch 1 faithfully moved batch notify from dataplane > code to generic virtio-blk code without affecting performance. > > If there is no difference then it means batch notify decreases > performance for some workloads (obviously not the same workload that > Ming Lei was running).
It turns out that Patch 1 slows down dataplane even though the code looks equivalent. After a lot of poking it turned out to be a subtle issue: The order of BHs in the AioContext->first_bh list affects performance. Linux AIO (block/linux-aio.c) invokes completion callbacks from a BH. Performance is much better if virtio-blk.c's batch BH is after the completion BH. The "fast" ordering notifies the guest in ~300 nanoseconds after the last request completion. The "slow" ordering sometimes takes 100 microseconds after the last request completion before the guest is notified. It probably depends on whether the event loop is kicked by another source. I'm thinking of scrapping the batch BH and instead using a notify plug/unplug callback to suppress notification until the last request has been processed. I also checked that batch notification does indeed improve performance compared to no batching. It offers a nice boost so we do want to port the feature from dataplane to non-dataplane. For the time being: consider this patch series broken due to the performance regression. Stefan
signature.asc
Description: PGP signature