Il 06/08/2014 10:38, Ming Lei ha scritto: > On Wed, Aug 6, 2014 at 3:45 PM, Paolo Bonzini <pbonz...@redhat.com> wrote: >> Il 06/08/2014 07:33, Ming Lei ha scritto: >>>>> I played a bit with the following, I hope it's not too naive. I couldn't >>>>> see a difference with your patches, but at least one reason for this is >>>>> probably that my laptop SSD isn't fast enough to make the CPU the >>>>> bottleneck. Haven't tried ramdisk yet, that would probably be the next >>>>> thing. (I actually wrote the patch up just for some profiling on my own, >>>>> not for comparing throughput, but it should be usable for that as well.) >>> This might not be good for the test since it is basically a sequential >>> read test, which can be optimized a lot by kernel. And I always use >>> randread benchmark. >> >> A microbenchmark already exists in tests/test-coroutine.c, and doesn't >> really tell us much; it's obvious that coroutines execute more code, the >> question is why it affects the iops performance. > > Could you take a look at the coroutine benchmark I worte? The running > result shows coroutine does decrease performance a lot compared with > bypass coroutine like the patchset is doing.
Your benchmark is synchronous, while disk I/O is asynchronous. Your benchmark doesn't add much compared to "time tests/test-coroutine -m perf -p /perf/yield". It takes 8 seconds on my machine, and 10^8 function calls obviously take less than 8 seconds. I've sent a patch to add a "baseline" function call benchmark to test-coroutine. >> The sequential read should be the right workload. For fio, you want to >> get as many iops as possible to QEMU and so you need randread. But >> qemu-img is not run in a guest and if the kernel optimizes sequential >> reads then the bypass should have even more benefits because it makes >> userspace proportionally more expensive. Do you agree with this? Paolo