On Wed, Jul 2, 2014 at 11:45 PM, Ming Lei <tom.leim...@gmail.com> wrote: > On Wed, Jul 2, 2014 at 4:54 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: >> On Tue, Jul 01, 2014 at 06:49:30PM +0200, Paolo Bonzini wrote: >>> Il 01/07/2014 16:49, Ming Lei ha scritto: >>> >Let me provide some data when running randread(bs 4k, libaio) >>> >from VM for 10sec: >>> > >>> >1), qemu.git/master >>> >- write(): 731K >>> >- rt_sigprocmask(): 417K >>> >- read(): 21K >>> >- ppoll(): 10K >>> >- io_submit(): 5K >>> >- io_getevents(): 4K >>> > >>> >2), qemu 2.0 >>> >- write(): 9K >>> >- read(): 28K >>> >- ppoll(): 16K >>> >- io_submit(): 12K >>> >- io_getevents(): 10K >>> > >>> >>> The sigprocmask can probably be optimized away since the thread's >>> >>> signal mask remains unchanged most of the time. >>> >>> >>> >>> I'm not sure what is causing the write(). >>> >I am investigating it... >>> >>> I would guess sigprocmask is getcontext (from qemu_coroutine_new) and write >>> is aio_notify (from qemu_bh_schedule). >> >> Aha! We shouldn't be executing qemu_coroutine_new() very often since we >> try to keep a freelist of coroutines. >> >> I think a tweak to the freelist could make the rt_sigprocmask() calls go >> away since we should be reusing coroutines instead of allocating/freeing >> them all the time. >> >>> Both can be eliminated by introducing a fast path in bdrv_aio_{read,write}v, >>> that bypasses coroutines in the common case of no I/O throttling, no >>> copy-on-write, etc. >> >> I tried that in 2012 and couldn't measure an improvement above the noise >> threshold, although it was without dataplane. >> >> BTW, we cannot eliminate the BH because the block layer guarantees that >> callbacks are not invoked with reentrancy. They are always invoked >> directly from the event loop through a BH. This simplifies callers >> since they don't need to worry about callbacks happening while they are >> still in bdrv_aio_readv(), for example. >> >> Removing this guarantee (by making callers safe first) is orthogonal to >> coroutines. But it's hard to do since it requires auditing a lot of >> code. >> >> Another idea is to skip aio_notify() when we're sure the event loop >> isn't blocked in g_poll(). Doing this is a thread-safe and lockless way >> might be tricky though. > > The attachment debug patch skips aio_notify() if qemu_bh_schedule > is running from current aio context, but looks there is still 120K > writes triggered. (without the patch, 400K can be observed in > same test) > > So is there still other writes not found in the path?
That must be for generating guest irq, which should have been processed as batch easily. Thanks, -- Ming Lei