On Thu, Nov 22, 2012 at 4:16 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > This series adds the -device virtio-blk-pci,x-data-plane=on property that > enables a high performance I/O codepath. A dedicated thread is used to > process > virtio-blk requests outside the global mutex and without going through the > QEMU > block layer. > > Khoa Huynh <k...@us.ibm.com> reported an increase from 140,000 IOPS to 600,000 > IOPS for a single VM using virtio-blk-data-plane in July: > > http://comments.gmane.org/gmane.comp.emulators.kvm.devel/94580 > > The virtio-blk-data-plane approach was originally presented at Linux Plumbers > Conference 2010. The following slides contain a brief overview: > > > http://linuxplumbersconf.org/2010/ocw/system/presentations/651/original/Optimizing_the_QEMU_Storage_Stack.pdf > > The basic approach is: > 1. Each virtio-blk device has a thread dedicated to handling ioeventfd > signalling when the guest kicks the virtqueue. > 2. Requests are processed without going through the QEMU block layer using > Linux AIO directly. > 3. Completion interrupts are injected via irqfd from the dedicated thread. > > To try it out: > > qemu -drive if=none,id=drive0,cache=none,aio=native,format=raw,file=... > -device virtio-blk-pci,drive=drive0,scsi=off,x-data-plane=on > > Limitations: > * Only format=raw is supported > * Live migration is not supported > * Block jobs, hot unplug, and other operations fail with -EBUSY > * I/O throttling limits are ignored > * Only Linux hosts are supported due to Linux AIO usage > > The code has reached a stage where I feel it is ready to merge. Users have > been playing with it for some time and want the significant performance boost. > > We are refactoring QEMU to get rid of the global mutex. I believe that > virtio-blk-data-plane can eventually become the default mode of operation. > > Instead of waiting for global mutex removal efforts to finish, I want to use > virtio-blk-data-plane as an example device for AioContext and threaded hw > dispatch refactoring. This means: > > 1. When the block layer can bind to an AioContext and execute I/O outside the > global mutex, virtio-blk-data-plane can use this (and gain image format > support). > > 2. When hw dispatch no longer needs the global mutex we can use hw/virtio.c > again and perhaps run a pool of iothreads instead of dedicated data plane > threads. > > But in the meantime, I have cleaned up the virtio-blk-data-plane code so that > it can be merged as an experimental feature. > > v4: > * Add qemu_iovec_concat_iov() [Paolo] > * Use QEMUIOVector to copy out virtio_blk_inhdr [Michael, Paolo] > > v3: > * Don't assume iovec layout [Michael] > * Better naming for hostmem.c MemoryListener callbacks [Don] > * More vring quarantining if commands are bogus instead of exiting [Blue] > > v2: > * Use MemoryListener for thread-safe memory mapping [Paolo, Anthony, and > everyone else pointed this out ;-)] > * Quarantine invalid vring instead of exiting [Blue] > * Replace __u16 kernel types with uint16_t [Blue] > > Changes from the RFC v9: > * Add x-data-plane=on|off option and coexist with regular virtio-blk code > * Create thread from BH so it inherits iothread cpusets > * Drain requests on vm_stop() so stopped guest does not access image file > * Add migration blocker > * Add bdrv_in_use() to prevent block jobs and other operations that can > interfere > * Drop IOQueue request merging for simplicity > * Drop ioctl interrupt injection and always use irqfd for simplicity > * Major cleanup to split up source files > * Rebase from qemu-kvm.git onto qemu.git > * Address Michael Tsirkin's review comments > > Stefan Hajnoczi (11): > raw-posix: add raw_get_aio_fd() for virtio-blk-data-plane > configure: add CONFIG_VIRTIO_BLK_DATA_PLANE > dataplane: add host memory mapping code > dataplane: add virtqueue vring code > dataplane: add event loop > dataplane: add Linux AIO request queue > iov: add iov_discard() to remove data > test-iov: add iov_discard() testcase > iov: add qemu_iovec_concat_iov() > dataplane: add virtio-blk data plane code > virtio-blk: add x-data-plane=on|off performance feature > > block.h | 9 + > block/raw-posix.c | 34 ++++ > configure | 21 +++ > hw/Makefile.objs | 2 +- > hw/dataplane/Makefile.objs | 3 + > hw/dataplane/event-poll.c | 109 ++++++++++++ > hw/dataplane/event-poll.h | 40 +++++ > hw/dataplane/hostmem.c | 165 ++++++++++++++++++ > hw/dataplane/hostmem.h | 52 ++++++ > hw/dataplane/ioq.c | 118 +++++++++++++ > hw/dataplane/ioq.h | 57 ++++++ > hw/dataplane/virtio-blk.c | 427 > +++++++++++++++++++++++++++++++++++++++++++++ > hw/dataplane/virtio-blk.h | 41 +++++ > hw/dataplane/vring.c | 344 ++++++++++++++++++++++++++++++++++++ > hw/dataplane/vring.h | 62 +++++++ > hw/virtio-blk.c | 59 ++++++- > hw/virtio-blk.h | 1 + > hw/virtio-pci.c | 3 + > iov.c | 80 +++++++-- > iov.h | 13 ++ > qemu-common.h | 3 + > tests/test-iov.c | 129 ++++++++++++++ > trace-events | 9 + > 23 files changed, 1767 insertions(+), 14 deletions(-) > create mode 100644 hw/dataplane/Makefile.objs > create mode 100644 hw/dataplane/event-poll.c > create mode 100644 hw/dataplane/event-poll.h > create mode 100644 hw/dataplane/hostmem.c > create mode 100644 hw/dataplane/hostmem.h > create mode 100644 hw/dataplane/ioq.c > create mode 100644 hw/dataplane/ioq.h > create mode 100644 hw/dataplane/virtio-blk.c > create mode 100644 hw/dataplane/virtio-blk.h > create mode 100644 hw/dataplane/vring.c > create mode 100644 hw/dataplane/vring.h
Michael, Paolo: Are you happy with v4? Kevin: Do you want to take this series through the block tree? Thanks, Stefan