Kevin Wolf <kw...@redhat.com> writes: > Am 18.02.2020 um 15:12 hat Markus Armbruster geschrieben: >> >> Regarding calling monitor_qmp_requests_pop_any_with_lock(): it needs >> >> @monitor_lock and @mon_list to be valid. We just initialized >> >> @monitor_lock, and @mon_list is empty. >> >> monitor_qmp_requests_pop_any_with_lock() should be safe and return null. >> >> monitor_qmp_dispatcher_co() should also be safe and yield without doing >> >> work. >> >> >> >> Can we exploit that to make things a bit simpler? Separate patch would >> >> be fine with me. >> > >> > If this is true, we could replace this line: >> > >> > aio_co_schedule(iohandler_get_aio_context(), qmp_dispatcher_co); >> > >> > with the following one: >> > >> > qemu_aio_coroutine_enter(iohandler_get_aio_context(), >> > qmp_dispatcher_co); >> > >> > I'm not sure that this is any simpler. >> >> Naive question: what's the difference between "scheduling", "entering", >> and "waking up" a coroutine? >> >> qemu_coroutine_enter() and qemu_aio_coroutine_enter() are in >> coroutine.h. > > These are the low-level functions that just enter the coroutine (i.e. do > a stack switch and jump to coroutine code) immediately when they are > called. They must be called in the right thread with the right > AioContext locks held. (What "right" means depends on the code run in > the coroutine.)
I see; low level building blocks. >> aio_co_schedule(), aio_co_wake() and aio_co_enter() are in aio.h. > > aio_co_schedule() never enters the coroutine directly. It only adds it > to the list of scheduled coroutines and schedules a BH in the target > AioContext. This BH in turn will actually enter the coroutine. Makes sense. > aio_co_enter() enters the coroutine immediately if it's being called > outside of coroutine context and in the right thread for the given > AioContext. If it's in the right thread, but in coroutine context then > it will queue the given coroutine so that it runs whenever the current > coroutine yields. If it's in the wrong thread, it uses aio_co_schedule() > to get it run in the right thread. Uff, sounds complicated. Lots of magic. > aio_co_wake() takes just the coroutine as a parameter and calls > aio_co_enter() with the context in which the coroutine was last run. Complicated by extension. > All of these functions make sure that the AioContext lock is taken when > the coroutine is entered and that they run in the right thread. > >> qemu_coroutine_enter() calls qemu_aio_coroutine_enter(). >> >> aio_co_wake() calls aio_co_enter() calls qemu_aio_coroutine_enter(). >> >> aio_co_enter() seems to be independent. > > It's not. It calls either aio_co_schedule() or > qemu_aio_coroutine_enter(), or inserts the coroutine ina queue that is > processed in qemu_aio_coroutine_enter(). I was blind. >> aio.h seems to be layered on top of coroutine.h. Should I prefer using >> aio.h to coroutine.h? > > In the common case, using the aio.h functions is preferable because they > just do "the right thing" without requiring as much thinking as the > low-level functions. Got it. >> >> > } >> >> > >> >> > QemuOptsList qemu_mon_opts = { >> >> > diff --git a/monitor/qmp.c b/monitor/qmp.c >> >> > index 54c06ba824..9444de9fcf 100644 >> >> > --- a/monitor/qmp.c >> >> > +++ b/monitor/qmp.c >> >> > @@ -133,6 +133,10 @@ static void monitor_qmp_respond(MonitorQMP *mon, >> >> > QDict *rsp) >> >> > } >> >> > } >> >> > >> >> > +/* >> >> > + * Runs outside of coroutine context for OOB commands, but in >> >> > coroutine context >> >> > + * for everything else. >> >> > + */ >> >> >> >> Nitpick: wrap around column 70, please. >> >> >> >> Note to self: the precondition is asserted in do_qmp_dispatch() below. >> >> Asserting here is impractical, because we don't know whether this is an >> >> OOB command. >> >> >> >> > static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req) >> >> > { >> >> > Monitor *old_mon; >> >> > @@ -211,43 +215,87 @@ static QMPRequest >> >> > *monitor_qmp_requests_pop_any_with_lock(void) >> >> > return req_obj; >> >> > } >> >> > >> >> > -void monitor_qmp_bh_dispatcher(void *data) >> >> > +void coroutine_fn monitor_qmp_dispatcher_co(void *data) >> >> > { >> >> > - QMPRequest *req_obj = monitor_qmp_requests_pop_any_with_lock(); >> >> > + QMPRequest *req_obj = NULL; >> >> > QDict *rsp; >> >> > bool need_resume; >> >> > MonitorQMP *mon; >> >> > >> >> > - if (!req_obj) { >> >> > - return; >> >> > - } >> >> > + while (true) { >> >> > + assert(atomic_mb_read(&qmp_dispatcher_co_busy) == true); >> >> >> >> Read and assert, then ... >> >> >> >> > + >> >> > + /* Mark the dispatcher as not busy already here so that we >> >> > don't miss >> >> > + * any new requests coming in the middle of our processing. */ >> >> > + atomic_mb_set(&qmp_dispatcher_co_busy, false); >> >> >> >> ... set. Would exchange, then assert be cleaner? >> > >> > Then you would ask me why the exchange has to be atomic. :-) >> >> Possibly :) >> >> > More practically, I would need a temporary variable so that I don't get >> > code with side effects in assert() (which may be compiled out with >> > NDEBUG). The temporary variable would never be read outside the assert >> > and would be unused with NDEBUG. >> > >> > So possible, yes, cleaner I'm not sure. >> >> I asked because the patch made me wonder whether qmp_dispatcher_co could >> change between the read and the set. > > Ah. No, this function is the only one that does a transition from true > to false. Everyone else only transitions from false to true (or leaves > the value unchanged as true). True, but it's non-local argument. Anyway, not a big deal. >> >> @@ -246,8 +287,15 @@ void monitor_qmp_bh_dispatcher(void *data) >> >> } >> >> qmp_request_free(req_obj); >> >> >> >> - /* Reschedule instead of looping so the main loop stays responsive >> >> */ >> >> - qemu_bh_schedule(qmp_dispatcher_bh); >> >> + /* >> >> + * Yield and reschedule so the main loop stays responsive. >> >> + * >> >> + * Move back to iohandler_ctx so that nested event loops for >> >> + * qemu_aio_context don't start new monitor commands. >> >> >> >> Can you explain this sentence for dummies? >> > >> > Nested event loops (i.e. AIO_WAIT_WHILE) poll qemu_aio_context, so if we >> > are scheduled there, the next iteration of the monitor dispatcher loop >> > could start from a nested event loop. If we are scheduled in >> > iohandler_ctx, then only the actual main loop will reenter the coroutine >> > and nested event loops ignore it. >> > >> > I'm not sure if or why this is actually important, but this matches >> > scheduling the dispatcher BH in iohandler_ctx in the code before this >> > patch. >> > >> > If we didn't do this, we could end up starting monitor requests in more >> > places than before, and who knows what that would mean. >> >> Let me say it in my own words, to make sure I got it. I'm going to >> ignore special cases like "not using I/O thread" and exec-oob. >> >> QMP monitor I/O happens in mon_iothread, in iohandler_ctx (I think). >> This pushes requests onto the monitor's qmp_requests queue. > > mon_iothread (like all iothreads) has a separate AioContext, which > doesn't have a name, but can be accessed with > iothread_get_aio_context(mon_iothread). Got it. @iohandler_ctx and @qemu_aio_context both belong to the main loop. @qemu_aio_context is for general "run in the main loop" use. It is polled both in the actual main loop and event loops nested in it. I figure "both" to keep things responsive. @iohandler_ctx is for "I/O handlers" (whatever those are). It is polled only in the actual main loop. I figure this means "I/O handlers" may get delayed while a nested event loop runs. But better starve a bit than run in unexpected places. >> Before this patch, the dispatcher runs in a bottom half in the main >> thread, in qemu_aio_context. > > The dispatcher BH is what is scheduled in iohandler_ctx. This means that > the BH is run from the main loop, but not from nested event loops. Got it. >> The patch moves it to a coroutine running in the main thread. It runs >> in iohandler_ctx, but switches to qemu_aio_context for executing command >> handlers. >> >> We want to keep command handlers running in qemu_aio_context, as before >> this patch. > > "Running in AioContext X" is kind of a sloppy term, and I'm afraid it > might be more confusing than helpful here. What this means is that the > code is run while holding the lock of the AioContext, and it registers > its BHs, callbacks etc. in that AioContext so it will be called from the > event loop of the respective thread. > > Before this patch, command handlers can't really use callbacks without a > nested event loop because they are synchronous. I figure an example for such a nested event loop is BDRV_POLL_WHILE() in bdrv_truncate(), which runs in the command handler for block_resize. True? > The BQL is held, which > is equivalent to holding the qemu_aio_context lock. Why is it equivalent? Are they always taken together? > But other than that, > all of the definition of "running in an AioContext" doesn't really apply. > > Now with coroutines, you assign them an AioContext, which is the context > in which BHs reentering the coroutine will be scheduled, e.g. from > explicit aio_co_schedule(), after qemu_co_sleep_ns() or after waiting > for a lock like a CoMutex. > > qemu_aio_context is what most (if not all) of the existing QMP handlers > use when they run a nested event loop, bdrv_truncate() appears to use bdrv_get_aio_context(), which is the BlockDriverState's AioContext if set, else @qemu_aio_context. > so running the coroutine in this > context while calling the handlers will make the transition the easiest. In the case of block_resize: running it in a coroutine makes bdrv_truncate() use that coroutine instead of creating one together with a nested event loop. "That coroutine" uses @qemu_aio_context. So does the nested event loop, *except* when the BlockDriverState has an AioContext. I have no idea when a BlockDriverState has an AioContext. Can you help? >> We want to run the rest in iohandler_ctx to ensure dispatching happens >> only in the main loop, as before this patch. >> >> Correct? > > This part is correct. Alright, let me try again, so you can point out the next level of my ignorance :) QMP monitor I/O happens in I/O thread @mon_iothread, in the I/O thread's AioContext. The I/O thread's event loop polls monitor I/O. The read handler pushes requests onto the monitor's qmp_requests queue. The dispatcher pops requests off these queues, and runs command handlers for them. Before this patch, the dispatcher runs in a bottom half in the main thread with the BQL[*] held, in @iohandler_ctx context. The latter ensures dispatch happens only from the main loop, not nested event loops. The command handlers don't care for the AioContext; they're synchronous. If they need AIO internally, they have to set up a nested event loop. When they do, they pick the AioContext explicitly, so they still don't rely on the current context. The patch moves the dispatcher to a coroutine running in the main thread. It runs in iohandler_ctx, but switches to qemu_aio_context for executing command handlers. The dispatcher keeps running in @iohandler_ctx. Good. The command handlers now run in @qemu_aio_context. Doesn't matter because "command handlers don't care for the AioContext". Except when a command handler has a fast path like bdrv_truncate() has. For those, @iohandler_ctx is generally not what we want. @qemu_aio_context is. "Generally", since every one needs to be inspected for AioContext shenanigans. [*] @qemu_global_mutex, a.k.a. "iothread mutex"; pity anyone who needs to figure this out from the source code by himself