On 30/07/14 13:39, Ming Lei wrote: > These patches bring up below 4 changes: > > - introduce selective coroutine bypass mechanism > for improving performance of virtio-blk dataplane with > raw format image > > - introduce object allocation pool and apply it to > virtio-blk dataplane for improving its performance > > - linux-aio changes: fixing for cases of -EAGAIN and partial > completion, increase max events to 256, and remove one unuseful > fields in 'struct qemu_laiocb' > > - support multi virtqueue for virtio-blk dataplane > > The virtio-blk multi virtqueue feature will be added to virtio spec 1.1[1], > and the 3.17 linux kernel[2] will support the feature in virtio-blk driver. > For those who wants to play the stuff, the kernel side patche can be found > in either Jens's block tree[3] or linux-next[4]. > > Below fio script running from VM is used for test improvement of these > patches: > > [global] > direct=1 > size=128G > bsrange=4k-4k > timeout=120 > numjobs=${JOBS} > ioengine=libaio > iodepth=64 > filename=/dev/vdc > group_reporting=1 > > [f] > rw=randread > > One quadcore VM(8G RAM) is created in below host to run above fio test: > > - server(16cores: 8 physical cores, 2 threads per physical core) > > Follows the test result on throughput improvement(IOPS) with > this patchset(4 virtqueues per virito-blk device) against QEMU > 2.1.0-rc5: 30% throughput improvement can be observed, and > scalability for parallel I/Os is improved more(80% throughput > improvement is observed in case of 4 JOBS). > > From above result, we can see both scalability and performance > get improved a lot. > > After commit 580b6b2aa2(dataplane: use the QEMU block > layer for I/O), average time for submiting one single > request has been increased a lot, as my trace, the average > time taken for submiting one request has been doubled even > though block plug&unplug mechanism is introduced to > ease its effect. That is why this patchset introduces > selective coroutine bypass mechanism and object allocation > pool for saving the time first. Based on QEMU 2.0, only > single virtio-blk dataplane multi virtqueue patch can get > better improvement than current result[5]. > > TODO: > - optimize block layer for linux aio, so that > more time can be saved for submitting request > - support more than one aio-context for improving > virtio-blk performance [...] > > [1], http://marc.info/?l=linux-api&m=140486843317107&w=2 > [2], http://marc.info/?l=linux-api&m=140418368421229&w=2 > [3], http://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/ > #for-3.17/drivers > [4], https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/ > [5], http://marc.info/?l=linux-api&m=140377573830230&w=2
FYI, I just tested with one virtqueue on s390 (3.15 as guest). It was just a quick sniff, but we are getting closer to the fio results that we had before commit 580b6b2aa2(dataplane: use the QEMU block layer for I/O). I cant give proper numbers right now, as I am on a shared storage subsystem but this looks like we are on the right track. I have not looked at the code, though. Christian