On Mon, Jun 30, 2014 at 4:08 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > On Sat, Jun 28, 2014 at 05:58:58PM +0800, Ming Lei wrote: >> On Sat, Jun 28, 2014 at 5:51 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: >> > Il 27/06/2014 20:01, Ming Lei ha scritto: >> > >> >> I just implemented plug&unplug based batching, and it is working now. >> >> But throughout still has no obvious improvement. >> >> >> >> Looks loading in IOthread is a bit low, so I am wondering if there is >> >> block point caused by Qemu QEMU block layer. >> > >> > >> > What does perf say? Also, you can try using the QEMU trace subsystem and >> > see where the latency goes. >> >> Follows some test result against 8589744aaf07b62 of >> upstream qemu, and the test is done on my 2core(4thread) >> laptop: >> >> 1, with my draft batch patches[1](only linux-aio supported now) >> - throughput: +16% compared qemu upstream >> - average time spent by handle_notify(): 310us >> - average time between two handle_notify(): 1591us >> (this time reflects latency of handling host_notifier) > > 16% is still a worthwhile improvement. I guess batching only benefits > aio=native since the threadpool ought to do better when it receives > requests as soon as possible.
16% is obtained with 'simple' trace-backend enabled, and looks the actual data with 'nop' trace is quite better than 16%, but it is still not good as 2.0.0 release. > > Patch or an RFC would be welcome. Yes, I will post it soon. >> 2, same tests on 2.0.0 release(use custom Linux AIO) >> - average time spent by handle_notify(): 68us >> - average time between calling two handle_notify(): 269us >> (this time reflects latency of handling host_notifier) >> >> From above tests, looks root cause is late handling notify, and >> qemu block layer becomes 4times slower than previous custom >> linux aio taken by dataplane. The above data is still obtained with 'simple' trace backend enabled, I need to find other ways to test again without extra trace io. > Try: > $ perf record -e syscalls:* --tid <iothread-tid> > ^C > $ perf script # shows the trace log > > The difference between syscalls in QEMU 2.0 and qemu.git/master could > reveal the problem. > Using perf you can also trace ioeventfd signalling in the host kernel > and compare against the QEMU handle_notify entry/return. It may be > easiest to use the ftrace_marker tracing backing in QEMU so the trace is > unified with the host kernel trace (./configure > --enable-trace-backend=ftrace and see the ftrace section in QEMU > docs/tracing.txt). > > This way you can see whether the ioeventfd signal -> handle_notify() > entry increased or something else is going on. Looks good ideas, I will try it. I have tried ftrace, but looks some traces may be dropped and my current script can't handle that well. Thanks, -- Ming Lei