On Wed, Aug 6, 2014 at 3:45 PM, Paolo Bonzini <pbonz...@redhat.com> wrote: > Il 06/08/2014 07:33, Ming Lei ha scritto: >>> > I played a bit with the following, I hope it's not too naive. I couldn't >>> > see a difference with your patches, but at least one reason for this is >>> > probably that my laptop SSD isn't fast enough to make the CPU the >>> > bottleneck. Haven't tried ramdisk yet, that would probably be the next >>> > thing. (I actually wrote the patch up just for some profiling on my own, >>> > not for comparing throughput, but it should be usable for that as well.) >> This might not be good for the test since it is basically a sequential >> read test, which can be optimized a lot by kernel. And I always use >> randread benchmark. > > A microbenchmark already exists in tests/test-coroutine.c, and doesn't > really tell us much; it's obvious that coroutines execute more code, the > question is why it affects the iops performance.
Could you take a look at the coroutine benchmark I worte? The running result shows coroutine does decrease performance a lot compared with bypass coroutine like the patchset is doing. > > The sequential read should be the right workload. For fio, you want to > get as many iops as possible to QEMU and so you need randread. But > qemu-img is not run in a guest and if the kernel optimizes sequential > reads then the bypass should have even more benefits because it makes > userspace proportionally more expensive. > > In any case, the patches as written have no hope of being accepted. If > you "invert" the logic from aio->co to co->aio, that would be much > better even if it's tedious. Let's not talk the bypass patch, and see if the coroutine is really the cause of performance drop first. Thanks,