On Tue, Jun 10, 2014 at 09:40:38AM +0800, Fam Zheng wrote: > On Mon, 06/09 15:43, Karl Rister wrote: > > Hi All > > > > I was asked by our development team to do a performance sniff test of the > > latest dataplane code on s390 and compare it against qemu.git. Here is a > > brief description of the configuration, the testing done, and then the > > results. > > > > Configuration: > > > > Host: 26 CPU LPAR, 64GB, 8 zFCP adapters > > Guest: 4 VCPU, 1GB, 128 virtio block devices > > > > Each virtio block device maps to a dm-multipath device in the host with 8 > > paths. Multipath is configured with the service-time policy. All block > > devices are configured to use the deadline IO scheduler. > > > > Test: > > > > FIO is used to run 4 scenarios: sequential read, sequential write, random > > read, and random write. Sequential scenarios use a 128KB request size and > > random scenarios us a 8KB request size. Each scenario is run with an > > increasing number of jobs, from 1 to 128 (powers of 2). Each job is bound > > to an individual file on an ext3 file system on a virtio device and uses > > O_DIRECT, libaio, and iodepth=1. Each test is run three times for 2 minutes > > each, the first iteration (a warmup) is thrown out and the next two > > iterations are averaged together. > > > > Results: > > > > Baseline: qemu.git 93f94f9018229f146ed6bbe9e5ff72d67e4bd7ab > > > > Dataplane: bdrv_set_aio_context 0ab50cde71aa27f39b8a3ea4766ff82671adb2a4 > > Hi Karl, > > Thanks for the results. > > The throughput differences look minimal, where is the bandwidth saturated in > these tests? And why use iodepth=1, not more? > > Thanks, > Fam > > > > > Sequential Read: > > > > Overall a slight throughput regression with a noticeable reduction in CPU > > efficiency. > > > > 1 Job: Throughput regressed -1.4%, CPU improved -0.83%. > > 2 Job: Throughput regressed -2.5%, CPU regressed +2.81% > > 4 Job: Throughput regressed -2.2%, CPU regressed +12.22% > > 8 Job: Throughput regressed -0.7%, CPU regressed +9.77% > > 16 Job: Throughput regressed -3.4%, CPU regressed +7.04% > > 32 Job: Throughput regressed -1.8%, CPU regressed +12.03% > > 64 Job: Throughput regressed -0.1%, CPU regressed +10.60% > > 128 Job: Throughput increased +0.3%, CPU regressed +10.70% > > > > Sequential Write: > > > > Mostly regressed throughput, although it gets better as job count increases > > and even has some gains at higher job counts. CPU efficiency is regressed. > > > > 1 Job: Throughput regressed -1.9%, CPU regressed +0.90% > > 2 Job: Throughput regressed -2.0%, CPU regressed +1.07% > > 4 Job: Throughput regressed -2.4%, CPU regressed +8.68% > > 8 Job: Throughput regressed -2.0%, CPU regressed +4.23% > > 16 Job: Throughput regressed -5.0%, CPU regressed +10.53% > > 32 Job: Throughput improved +7.6%, CPU regressed +7.37% > > 64 Job: Throughput regressed -0.6%, CPU regressed +7.29% > > 128 Job: Throughput improved +8.3%, CPU regressed +6.68% > > > > Random Read: > > > > Again, mostly throughput regressions except for the largest job counts. CPU > > efficiency is regressed at all data points. > > > > 1 Job: Throughput regressed -3.0%, CPU regressed +0.14% > > 2 Job: Throughput regressed -3.6%, CPU regressed +6.86% > > 4 Job: Throughput regressed -5.1%, CPU regressed +11.11% > > 8 Job: Throughput regressed -8.6%, CPU regressed +12.32% > > 16 Job: Throughput regressed -5.7%, CPU regressed +12.99% > > 32 Job: Throughput regressed -7.4%, CPU regressed +7.62% > > 64 Job: Throughput improved +10.0%, CPU regressed +10.83% > > 128 Job: Throughput improved +10.7%, CPU regressed +10.85% > > > > Random Write: > > > > Throughput and CPU regressed at all but one data point. > > > > 1 Job: Throughput regressed -2.3%, CPU improved -1.50% > > 2 Job: Throughput regressed -2.2%, CPU regressed +0.16% > > 4 Job: Throughput regressed -1.0%, CPU regressed +8.36% > > 8 Job: Throughput regressed -8.6%, CPU regressed +12.47% > > 16 Job: Throughput regressed -3.1%, CPU regressed +12.40% > > 32 Job: Throughput regressed -0.2%, CPU regressed +11.59% > > 64 Job: Throughput regressed -1.9%, CPU regressed +12.65% > > 128 Job: Throughput improved +5.6%, CPU regressed +11.68% > > > > > > * CPU consumption is an efficiency calculation of usage per MB of > > throughput.
Thanks for sharing! This is actually not too bad considering that the bdrv_set_aio_context() code uses the QEMU block layer while the older qemu.git code uses a custom Linux AIO code path. The CPU efficiency regression is interesting. Do you have any profiling data that shows where the hot spots are? Thanks, Stefan
pgpCw66xaD0J7.pgp
Description: PGP signature