On Wed, Jul 2, 2014 at 11:45 PM, Ming Lei <tom.leim...@gmail.com> wrote:
> On Wed, Jul 2, 2014 at 4:54 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote:
>> On Tue, Jul 01, 2014 at 06:49:30PM +0200, Paolo Bonzini wrote:
>>> Il 01/07/2014 16:49, Ming Lei ha scritto:
>>> >Let me provide some data when running randread(bs 4k, libaio)
>>> >from VM for 10sec:
>>> >
>>> >1), qemu.git/master
>>> >- write(): 731K
>>> >- rt_sigprocmask(): 417K
>>> >- read(): 21K
>>> >- ppoll(): 10K
>>> >- io_submit(): 5K
>>> >- io_getevents(): 4K
>>> >
>>> >2), qemu 2.0
>>> >- write(): 9K
>>> >- read(): 28K
>>> >- ppoll(): 16K
>>> >- io_submit(): 12K
>>> >- io_getevents(): 10K
>>> >
>>> >>> The sigprocmask can probably be optimized away since the thread's
>>> >>> signal mask remains unchanged most of the time.
>>> >>>
>>> >>> I'm not sure what is causing the write().
>>> >I am investigating it...
>>>
>>> I would guess sigprocmask is getcontext (from qemu_coroutine_new) and write
>>> is aio_notify (from qemu_bh_schedule).
>>
>> Aha!  We shouldn't be executing qemu_coroutine_new() very often since we
>> try to keep a freelist of coroutines.
>>
>> I think a tweak to the freelist could make the rt_sigprocmask() calls go
>> away since we should be reusing coroutines instead of allocating/freeing
>> them all the time.
>>
>>> Both can be eliminated by introducing a fast path in bdrv_aio_{read,write}v,
>>> that bypasses coroutines in the common case of no I/O throttling, no
>>> copy-on-write, etc.
>>
>> I tried that in 2012 and couldn't measure an improvement above the noise
>> threshold, although it was without dataplane.
>>
>> BTW, we cannot eliminate the BH because the block layer guarantees that
>> callbacks are not invoked with reentrancy.  They are always invoked
>> directly from the event loop through a BH.  This simplifies callers
>> since they don't need to worry about callbacks happening while they are
>> still in bdrv_aio_readv(), for example.
>>
>> Removing this guarantee (by making callers safe first) is orthogonal to
>> coroutines.  But it's hard to do since it requires auditing a lot of
>> code.
>>
>> Another idea is to skip aio_notify() when we're sure the event loop
>> isn't blocked in g_poll().  Doing this is a thread-safe and lockless way
>> might be tricky though.
>
> The attachment debug patch skips aio_notify() if qemu_bh_schedule
> is running from current aio context, but looks there is still 120K
> writes triggered. (without the patch, 400K can be observed in
> same test)
>
> So is there still other writes not found in the path?

That must be for generating guest irq, which should have been
processed as batch easily.


Thanks,
-- 
Ming Lei

Reply via email to