Am 18.12.2014 um 11:34 schrieb Kevin Wolf: > Am 16.12.2014 um 17:00 hat Peter Lieven geschrieben: >> On 16.12.2014 16:48, Kevin Wolf wrote: >>> Am 16.12.2014 um 16:21 hat Peter Lieven geschrieben: >>>> this series adds the long missing multiread support to virtio-blk. >>>> >>>> some remarks: >>>> - i introduced rd_merged and wr_merged block accounting stats to >>>> blockstats as a generic interface which can be set from any >>>> driver that will introduce multirequest merging in the future. >>>> - the knob to disable request merging is not yet there. I would >>>> add it to the device properties also as a generic interface >>>> to have the same switch for any driver that might introduce >>>> request merging in the future. As there has been no knob in >>>> the past I would post this as a seperate series as it needs >>>> some mangling in parameter parsing which might lead to further >>>> discussions. >>>> - the old multiwrite interface is still there and might be removed. >>>> >>>> v1->v2: >>>> - add overflow checking for nb_sectors [Kevin] >>>> - do not change the name of the macro of max mergable requests. [Fam] >>> Diff to v1 looks good. Now I just need to check what it does to the >>> performance. Did you run any benchmarks yourself? >> I ran several installs of Debian/Ubuntu, Booting of Windows and Linux >> systems. I looked at rd_total_time_ns and wr_total_time_ns and saw >> no increase. Ofter I even saw even a decrease. >> >> {rd,wr}_total_time_ns measures the time from virtio_blk_handle_request >> to virtio_blk_rw_complete. So it seems to be a good indicator for the time >> spent with I/O. >> >> What you could to is post it on the top of your fio testing stack and >> look at the throughput. Sequential Reads should be faster. The rest >> not worse. > So I finally ran some fio benchmark on the series. The result for small > sequential reads (4k) is quite noisy, but it seems to be improved a bit. > Larger sequenial reads (64k) and random reads seem to be mostly > unaffected. > > For writes, however, I can see a degradation. Perhaps running multiple > jobs in parallel means that we don't detect and merge sequential > requests any more when they are interleaved with another sequential job. > Or do you have an idea what else could have changed for writes?
I tried to digged a little more into this topic and maybe found whats going on. If I a right you are using Kernel >= 3.17 in the guest for your tests? Here are my test results of 4k sequential writes under a Linux 3.13 guest. Master: virtio-fio: rd_bytes=856064 wr_bytes=34359738368 rd_operations=209 wr_operations=8388608 flush_operations=0 wr_total_time_ns=980610962941 rd_total_time_ns=5656675 flush_total_time_ns=0 rd_merged=0 wr_merged=6533338 Multiread_v2: virtio-fio: rd_bytes=856064 wr_bytes=34359738368 rd_operations=209 wr_operations=8388608 flush_operations=0 wr_total_time_ns=558830918737 rd_total_time_ns=6159151 flush_total_time_ns=0 rd_merged=0 wr_merged=6266824 As you can see the number of merged requests is in the same order, but the wr_total_time_ns is heavily improved! What happened between Linux 3.13 and 3.17 is that Ming introduced the Multiqeue feature into the virtio-blk kernel code. The blk-mq developers intentionally set the number of hw_queues to 4 in Kernel 3.13 for virtio-blk. With the introduction of the Multiqeue feature in 3.17 the number of hw_queues is set the the number of virtqueues. If multique is unsupported the number is 1 and thus the number of hw_queues is also 1. So all the requests from all fio processes go into the same hw_queue and this seems to break the performance. The requests are heavily interleaved this way and as I only merge strictly sequential requests the old implementation which sorts requests wins. But it uses twice the computation time for this. I think this needs to be fixed in the virtio-blk kernel code and if we introduce multiqueue for virtio-blk into qemu, we should set the number of virtqueues to at least 4. Maybe the number of cpus would also be a good choice?! Peter