Re: [PATCH v4 3/4] qmp: Move dispatcher to a coroutine

Markus Armbruster Wed, 19 Feb 2020 01:04:20 -0800

Kevin Wolf <kw...@redhat.com> writes:

> Am 18.02.2020 um 15:12 hat Markus Armbruster geschrieben:
>> >> Regarding calling monitor_qmp_requests_pop_any_with_lock(): it needs
>> >> @monitor_lock and @mon_list to be valid.  We just initialized
>> >> @monitor_lock, and @mon_list is empty.
>> >> monitor_qmp_requests_pop_any_with_lock() should be safe and return null.
>> >> monitor_qmp_dispatcher_co() should also be safe and yield without doing
>> >> work.
>> >> 
>> >> Can we exploit that to make things a bit simpler?  Separate patch would
>> >> be fine with me.
>> >
>> > If this is true, we could replace this line:
>> >
>> >     aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co);
>> >
>> > with the following one:
>> >
>> >     qemu_aio_coroutine_enter(iohandler_get_aio_context(), 
>> > qmp_dispatcher_co);
>> >
>> > I'm not sure that this is any simpler.
>> 
>> Naive question: what's the difference between "scheduling", "entering",
>> and "waking up" a coroutine?
>> 
>> qemu_coroutine_enter() and qemu_aio_coroutine_enter() are in
>> coroutine.h.
>
> These are the low-level functions that just enter the coroutine (i.e. do
> a stack switch and jump to coroutine code) immediately when they are
> called. They must be called in the right thread with the right
> AioContext locks held. (What "right" means depends on the code run in
> the coroutine.)


I see; low level building blocks.

>> aio_co_schedule(), aio_co_wake() and aio_co_enter() are in aio.h.
>
> aio_co_schedule() never enters the coroutine directly. It only adds it
> to the list of scheduled coroutines and schedules a BH in the target
> AioContext. This BH in turn will actually enter the coroutine.

Makes sense.

> aio_co_enter() enters the coroutine immediately if it's being called
> outside of coroutine context and in the right thread for the given
> AioContext. If it's in the right thread, but in coroutine context then
> it will queue the given coroutine so that it runs whenever the current
> coroutine yields. If it's in the wrong thread, it uses aio_co_schedule()
> to get it run in the right thread.

Uff, sounds complicated.  Lots of magic.

> aio_co_wake() takes just the coroutine as a parameter and calls
> aio_co_enter() with the context in which the coroutine was last run.

Complicated by extension.

> All of these functions make sure that the AioContext lock is taken when
> the coroutine is entered and that they run in the right thread.
>
>> qemu_coroutine_enter() calls qemu_aio_coroutine_enter().
>> 
>> aio_co_wake() calls aio_co_enter() calls qemu_aio_coroutine_enter().
>> 
>> aio_co_enter() seems to be independent.
>
> It's not. It calls either aio_co_schedule() or
> qemu_aio_coroutine_enter(), or inserts the coroutine ina queue that is
> processed in qemu_aio_coroutine_enter().

I was blind.

>> aio.h seems to be layered on top of coroutine.h.  Should I prefer using
>> aio.h to coroutine.h?
>
> In the common case, using the aio.h functions is preferable because they
> just do "the right thing" without requiring as much thinking as the
> low-level functions.

Got it.

>> >> >  }
>> >> >  
>> >> >  QemuOptsList qemu_mon_opts = {
>> >> > diff --git a/monitor/qmp.c b/monitor/qmp.c
>> >> > index 54c06ba824..9444de9fcf 100644
>> >> > --- a/monitor/qmp.c
>> >> > +++ b/monitor/qmp.c
>> >> > @@ -133,6 +133,10 @@ static void monitor_qmp_respond(MonitorQMP *mon, 
>> >> > QDict *rsp)
>> >> >      }
>> >> >  }
>> >> >  
>> >> > +/*
>> >> > + * Runs outside of coroutine context for OOB commands, but in 
>> >> > coroutine context
>> >> > + * for everything else.
>> >> > + */
>> >> 
>> >> Nitpick: wrap around column 70, please.
>> >> 
>> >> Note to self: the precondition is asserted in do_qmp_dispatch() below.
>> >> Asserting here is impractical, because we don't know whether this is an
>> >> OOB command.
>> >> 
>> >> >  static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
>> >> >  {
>> >> >      Monitor *old_mon;
>> >> > @@ -211,43 +215,87 @@ static QMPRequest 
>> >> > *monitor_qmp_requests_pop_any_with_lock(void)
>> >> >      return req_obj;
>> >> >  }
>> >> >  
>> >> > -void monitor_qmp_bh_dispatcher(void *data)
>> >> > +void coroutine_fn monitor_qmp_dispatcher_co(void *data)
>> >> >  {
>> >> > -    QMPRequest *req_obj = monitor_qmp_requests_pop_any_with_lock();
>> >> > +    QMPRequest *req_obj = NULL;
>> >> >      QDict *rsp;
>> >> >      bool need_resume;
>> >> >      MonitorQMP *mon;
>> >> >  
>> >> > -    if (!req_obj) {
>> >> > -        return;
>> >> > -    }
>> >> > +    while (true) {
>> >> > +        assert(atomic_mb_read(&qmp_dispatcher_co_busy) == true);
>> >> 
>> >> Read and assert, then ...
>> >> 
>> >> > +
>> >> > +        /* Mark the dispatcher as not busy already here so that we 
>> >> > don't miss
>> >> > +         * any new requests coming in the middle of our processing. */
>> >> > +        atomic_mb_set(&qmp_dispatcher_co_busy, false);
>> >> 
>> >> ... set.  Would exchange, then assert be cleaner?
>> >
>> > Then you would ask me why the exchange has to be atomic. :-)
>> 
>> Possibly :)
>> 
>> > More practically, I would need a temporary variable so that I don't get
>> > code with side effects in assert() (which may be compiled out with
>> > NDEBUG). The temporary variable would never be read outside the assert
>> > and would be unused with NDEBUG.
>> >
>> > So possible, yes, cleaner I'm not sure.
>> 
>> I asked because the patch made me wonder whether qmp_dispatcher_co could
>> change between the read and the set.
>
> Ah. No, this function is the only one that does a transition from true
> to false. Everyone else only transitions from false to true (or leaves
> the value unchanged as true).

True, but it's non-local argument.  Anyway, not a big deal.

>> >>   @@ -246,8 +287,15 @@ void monitor_qmp_bh_dispatcher(void *data)
>> >>            }
>> >>            qmp_request_free(req_obj);
>> >> 
>> >>   -    /* Reschedule instead of looping so the main loop stays responsive 
>> >> */
>> >>   -    qemu_bh_schedule(qmp_dispatcher_bh);
>> >>   +        /*
>> >>   +         * Yield and reschedule so the main loop stays responsive.
>> >>   +         *
>> >>   +         * Move back to iohandler_ctx so that nested event loops for
>> >>   +         * qemu_aio_context don't start new monitor commands.
>> >> 
>> >> Can you explain this sentence for dummies?
>> >
>> > Nested event loops (i.e. AIO_WAIT_WHILE) poll qemu_aio_context, so if we
>> > are scheduled there, the next iteration of the monitor dispatcher loop
>> > could start from a nested event loop. If we are scheduled in
>> > iohandler_ctx, then only the actual main loop will reenter the coroutine
>> > and nested event loops ignore it.
>> >
>> > I'm not sure if or why this is actually important, but this matches
>> > scheduling the dispatcher BH in iohandler_ctx in the code before this
>> > patch.
>> >
>> > If we didn't do this, we could end up starting monitor requests in more
>> > places than before, and who knows what that would mean.
>> 
>> Let me say it in my own words, to make sure I got it.  I'm going to
>> ignore special cases like "not using I/O thread" and exec-oob.
>> 
>> QMP monitor I/O happens in mon_iothread, in iohandler_ctx (I think).
>> This pushes requests onto the monitor's qmp_requests queue.
>
> mon_iothread (like all iothreads) has a separate AioContext, which
> doesn't have a name, but can be accessed with
> iothread_get_aio_context(mon_iothread).

Got it.

@iohandler_ctx and @qemu_aio_context both belong to the main loop.

@qemu_aio_context is for general "run in the main loop" use.  It is
polled both in the actual main loop and event loops nested in it.  I
figure "both" to keep things responsive.

@iohandler_ctx is for "I/O handlers" (whatever those are).  It is polled
only in the actual main loop.  I figure this means "I/O handlers" may
get delayed while a nested event loop runs.  But better starve a bit
than run in unexpected places.

>> Before this patch, the dispatcher runs in a bottom half in the main
>> thread, in qemu_aio_context.
>
> The dispatcher BH is what is scheduled in iohandler_ctx. This means that
> the BH is run from the main loop, but not from nested event loops.

Got it.

>> The patch moves it to a coroutine running in the main thread.  It runs
>> in iohandler_ctx, but switches to qemu_aio_context for executing command
>> handlers.
>> 
>> We want to keep command handlers running in qemu_aio_context, as before
>> this patch.
>
> "Running in AioContext X" is kind of a sloppy term, and I'm afraid it
> might be more confusing than helpful here. What this means is that the
> code is run while holding the lock of the AioContext, and it registers
> its BHs, callbacks etc. in that AioContext so it will be called from the
> event loop of the respective thread.
>
> Before this patch, command handlers can't really use callbacks without a
> nested event loop because they are synchronous.

I figure an example for such a nested event loop is BDRV_POLL_WHILE() in
bdrv_truncate(), which runs in the command handler for block_resize.
True?

>                                                 The BQL is held, which
> is equivalent to holding the qemu_aio_context lock.

Why is it equivalent?  Are they always taken together?

>                                                     But other than that,
> all of the definition of "running in an AioContext" doesn't really apply.
>
> Now with coroutines, you assign them an AioContext, which is the context
> in which BHs reentering the coroutine will be scheduled, e.g. from
> explicit aio_co_schedule(), after qemu_co_sleep_ns() or after waiting
> for a lock like a CoMutex.
>
> qemu_aio_context is what most (if not all) of the existing QMP handlers
> use when they run a nested event loop,

bdrv_truncate() appears to use bdrv_get_aio_context(), which is the
BlockDriverState's AioContext if set, else @qemu_aio_context.

>                                        so running the coroutine in this
> context while calling the handlers will make the transition the easiest.

In the case of block_resize: running it in a coroutine makes
bdrv_truncate() use that coroutine instead of creating one together with
a nested event loop.  "That coroutine" uses @qemu_aio_context.  So does
the nested event loop, *except* when the BlockDriverState has an
AioContext.

I have no idea when a BlockDriverState has an AioContext.  Can you help?

>> We want to run the rest in iohandler_ctx to ensure dispatching happens
>> only in the main loop, as before this patch.
>> 
>> Correct?
>
> This part is correct.

Alright, let me try again, so you can point out the next level of my
ignorance :)

QMP monitor I/O happens in I/O thread @mon_iothread, in the I/O thread's
AioContext.  The I/O thread's event loop polls monitor I/O.  The read
handler pushes requests onto the monitor's qmp_requests queue.

The dispatcher pops requests off these queues, and runs command
handlers for them.

Before this patch, the dispatcher runs in a bottom half in the main
thread with the BQL[*] held, in @iohandler_ctx context.  The latter
ensures dispatch happens only from the main loop, not nested event
loops.

The command handlers don't care for the AioContext; they're synchronous.
If they need AIO internally, they have to set up a nested event loop.
When they do, they pick the AioContext explicitly, so they still don't
rely on the current context.

The patch moves the dispatcher to a coroutine running in the main
thread.  It runs in iohandler_ctx, but switches to qemu_aio_context for
executing command handlers.

The dispatcher keeps running in @iohandler_ctx.  Good.

The command handlers now run in @qemu_aio_context.  Doesn't matter
because "command handlers don't care for the AioContext".  Except when a
command handler has a fast path like bdrv_truncate() has.  For those,
@iohandler_ctx is generally not what we want.  @qemu_aio_context is.
"Generally", since every one needs to be inspected for AioContext
shenanigans.


[*] @qemu_global_mutex, a.k.a. "iothread mutex"; pity anyone who needs
to figure this out from the source code by himself

Re: [PATCH v4 3/4] qmp: Move dispatcher to a coroutine

Reply via email to