On Thu, Mar 23, 2017 at 4:22 PM, Ashijeet Acharya <ashijeetacha...@gmail.com> wrote: > On Thu, Mar 23, 2017 at 8:39 PM, Stefan Hajnoczi <stefa...@gmail.com> wrote: >> On Tue, Mar 21, 2017 at 09:14:08AM +0000, Ashijeet Acharya wrote: >>> On Tue, 21 Mar 2017 at 13:21, Stefan Hajnoczi <stefa...@gmail.com> wrote: >>> >>> > On Sat, Mar 11, 2017 at 11:54 AM, Ashijeet Acharya >>> > <ashijeetacha...@gmail.com> wrote: >>> > > This series optimizes the I/O performance of VMDK driver. >>> > > >>> > > Patch 1 makes the VMDK driver to allocate multiple clusters at once. >>> > Earlier >>> > > it used to allocate cluster by cluster which slowed down its performance >>> > to a >>> > > great extent. >>> > > >>> > > Patch 2 changes the metadata update code to update the L2 tables for >>> > multiple >>> > > clusters at once. >>> > >>> > This patch series is a performance optimization. Benchmark results >>> > are required to justify optimizations. Please include performance >>> > results in the next revision. >>> > >>> > A popular disk I/O benchmarking is fio (https://github.com/axboe/fio). >>> > I suggest a write-heavy workload with a large block size: >>> > >>> > $ cat fio.job >>> > [global] >>> > direct=1 >>> > filename=/dev/vdb >>> > ioengine=libaio >>> > runtime=30 >>> > ramp_time=5 >>> > >>> > [job1] >>> > iodepth=4 >>> > rw=randwrite >>> > bs=256k >>> > $ for i in 1 2 3 4 5; do fio --output=fio-$i.txt fio.job; done # >>> > WARNING: overwrites /dev/vdb >>> > >>> > It's good practice to run the benchmark several times because there is >>> > usually some variation between runs. This allows you to check that >>> > the variance is within a reasonable range (5-10% on a normal machine >>> > that hasn't been specially prepared for benchmarking). >>> >>> >>> I ran a few write tests of 128M using qemu-io and the results showed the >>> time to drop to almost half, will those work? Although, I will also try to >>> use the tool you mentioned later today when I am free and include those >>> results as well. >> >> Maybe, it's hard to say without seeing the commands you ran. > > These are the commands I ran to test the write requests: > > My test file "test1.vmdk" is a 1G empty vmdk image created by using > 'qemu-img' tool. > > Before optimization: > $ ./bin/qemu-io -f vmdk --cache writeback > qemu-io> open -n -o driver=vmdk test1.vmdk > qemu-io> aio_write 0 128M > qemu-io> wrote 134217728/134217728 bytes at offset 0 > 128 MiB, 1 ops; 0:00:16.46 (7.772 MiB/sec and 0.0607 ops/sec) > > After optimization: > $ ./bin/qemu-io -f vmdk --cache writeback > qemu-io> open -n -o driver=vmdk test1.vmdk > qemu-io> aio_write 0 128M > qemu-io> wrote 134217728/134217728 bytes at offset 0 > 128 MiB, 1 ops; 0:00:08.19 (15.627 MiB/sec and 0.1221 ops/sec) > > Will these work?
It is best to avoid --cache writeback in performance tests because using the host page cache puts the performance at the mercy of the kernel's page cache. I have run the following benchmark using "qemu-img bench": This patch series improves 128 KB sequential write performance to an empty VMDK file by 29%. Benchmark command: ./qemu-img bench -w -c 1024 -s 128K -d 1 -t none -f vmdk test.vmdk (Please include the 2 lines above in the next revision of the patch.) The qemu-img bench options used: * -w issues write requests instead of reads * -c 1024 terminates after 1024 requests * -s 128K sets the request size to 128 KB * -d 1 restricts the benchmark to 1 in-flight request at any time * -t none uses O_DIRECT to bypass the host page cache 1. Without your patch $ for i in 1 2 3 4 5; do ./qemu-img create -f vmdk test.vmdk 4G; ./qemu-img bench -w -c 1024 -s 128K -d 1 -t none -f vmdk test.vmdk; done Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined Sending 1024 write requests, 131072 bytes each, 1 in parallel (starting at offset 0, step size 131072) Run completed in 35.081 seconds. Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined Sending 1024 write requests, 131072 bytes each, 1 in parallel (starting at offset 0, step size 131072) Run completed in 34.548 seconds. Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined Sending 1024 write requests, 131072 bytes each, 1 in parallel (starting at offset 0, step size 131072) Run completed in 34.637 seconds. Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined Sending 1024 write requests, 131072 bytes each, 1 in parallel (starting at offset 0, step size 131072) Run completed in 34.411 seconds. Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined Sending 1024 write requests, 131072 bytes each, 1 in parallel (starting at offset 0, step size 131072) Run completed in 34.599 seconds. 2. With your patch $ for i in 1 2 3 4 5; do ./qemu-img create -f vmdk test.vmdk 4G; ./qemu-img bench -w -c 1024 -s 128K -d 1 -t none -f vmdk test.vmdk; done Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined Sending 1024 write requests, 131072 bytes each, 1 in parallel (starting at offset 0, step size 131072) Run completed in 24.974 seconds. Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined Sending 1024 write requests, 131072 bytes each, 1 in parallel (starting at offset 0, step size 131072) Run completed in 24.769 seconds. Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined Sending 1024 write requests, 131072 bytes each, 1 in parallel (starting at offset 0, step size 131072) Run completed in 24.800 seconds. Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined Sending 1024 write requests, 131072 bytes each, 1 in parallel (starting at offset 0, step size 131072) Run completed in 24.928 seconds. Formatting 'test.vmdk', fmt=vmdk size=4294967296 compat6=off hwversion=undefined Sending 1024 write requests, 131072 bytes each, 1 in parallel (starting at offset 0, step size 131072) Run completed in 24.897 seconds. Stefan