On Wed, Nov 13, 2019 at 05:45:57PM +0100, Markus Armbruster wrote: > Wolfgang Bumiller <w.bumil...@proxmox.com> writes: > > > When a monitor's queue is filled up in handle_qmp_command() > > it gets suspended. It's the dispatcher bh's job currently to > > resume the monitor, which it does after processing an event > > from the queue. However, it is possible for a > > CHR_EVENT_CLOSED event to be processed before before the bh > > is scheduled, which will clear the queue without resuming > > the monitor, thereby preventing the dispatcher from reaching > > the resume() call. > > Any new connections to the qmp socket will be accept()ed and > > show the greeting, but will not respond to any messages sent > > afterwards (as they will not be read from the > > still-suspended socket). > > Fix this by resuming the monitor when clearing a queue which > > was filled up. > > > > Signed-off-by: Wolfgang Bumiller <w.bumil...@proxmox.com> > > --- > > Changes to v1: > > * Update commit message to include the resulting symptoms. > > * Moved the resume code from `monitor_qmp_cleanup_req_queue_locked` to > > `monitor_qmp_cleanup_queues` to avoid an unnecessary resume when > > destroying the monitor (as the `_locked` version is also used by > > `monitor_data_destroy()`. > > * Renamed `monitor_qmp_cleanup_queues` to > > `monitor_qmp_cleanup_queues_and_resume` to reflect the change and be > > verbose about it for potential future users of the function. > > Currently the only user is `monitor_qmp_event()` in the > > `CHR_EVENT_CLOSED` case, which is exactly the problematic case > > currently. > > > > Sorry for the deleay :| > > Same to you (my sorry excuse is KVM Forum). Now we need to hurry to get > this fix into 4.2. Let's try.
Uh, that's already in Rcs. *hurries* > > monitor/qmp.c | 24 ++++++++++++++++++++++-- > > 1 file changed, 22 insertions(+), 2 deletions(-) > > > > diff --git a/monitor/qmp.c b/monitor/qmp.c > > index 9d9e5d8b27..df689aa95e 100644 > > --- a/monitor/qmp.c > > +++ b/monitor/qmp.c > > @@ -75,10 +75,30 @@ static void > > monitor_qmp_cleanup_req_queue_locked(MonitorQMP *mon) > > } > > } > > > > -static void monitor_qmp_cleanup_queues(MonitorQMP *mon) > > +static void monitor_qmp_cleanup_queues_and_resume(MonitorQMP *mon) > > Let's rename to _cleanup_queue_and resume(). The plural is a remnant > from when we also had a response queue. Gone since commit 27656018d86. > > > { > > qemu_mutex_lock(&mon->qmp_queue_lock); > > + > > + /* > > + * Same condition as in monitor_qmp_bh_dispatcher(), but before > > removing an > > + * element from the queue (hence no `- 1`), also, the queue should not > > be > > + * empty either, otherwise the monitor hasn't been suspended yet (or > > was > > + * already resumed). > > + */ > > Comment lines should be wrapped around colum 70. > > > + bool need_resume = (!qmp_oob_enabled(mon) && mon->qmp_requests->length > > > 0) > > + || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX; > > Now let me digest the comment. Here's condition in > monitor_qmp_bh_dispatcher(): > > need_resume = !qmp_oob_enabled(mon) || > mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1; > > "Same but before removing" is > > bool need_resume = !qmp_oob_enabled(mon) > || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX; > > That leaves the "also" part. It's been too long since v1, I don't > remember a thing, and I'm too dense today to understand without more > help. Can you help me some more? Not empty: `&& length > 0`. The 2nd part of the disjunction already implies it (as it is `length == max`) so I only added it to the first part. A pointless optimization on my part given that it's supposed to be more easily verified against the comment, so I'll change it to be clearer: ($condition_without_'-1') && (!g_queue_is_empty()), so: bool need_resume = (!qmp_oob_enabled(mon) || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX) && !g_queue_is_empty(mon->qmp_requests); > > > + > > monitor_qmp_cleanup_req_queue_locked(mon); > > + > > + if (need_resume) { > > + /* > > + * Pairs with the monitor_suspend() in handle_qmp_command() in > > case the > > + * queue gets cleared from a CH_EVENT_CLOSED event before the > > dispatch > > + * bh got scheduled. > > + */ > > CHR_EVENT_CLOSED > > Suggest: > > /* > * handle_qmp_command() suspened the monitor because the > * request queue filled up, to be resumed when the queue has > * space again. We just emptied it; resume the monitor. > */ > > If we want to record the issue that made us fix the missing resume, we > can add: > > * Without this, the monitor would remain suspended forever > * when we get here while the monitor is suspended. An > * unfortunately times CHR_EVENT_CLOSED can do the trick. > > Also update handle_qmp_command()'s comment: > > /* > * Suspend the monitor when we can't queue more requests after > * this one. Dequeuing in monitor_qmp_bh_dispatcher() or > * monitor_qmp_cleanup_queue_and_resume() will resume it. > * Note that when OOB is disabled, we queue at most one command, > * for backward compatibility. > */ will do.