Wolfgang Bumiller <w.bumil...@proxmox.com> writes: > When a monitor's queue is filled up in handle_qmp_command() > it gets suspended. It's the dispatcher bh's job currently to > resume the monitor, which it does after processing an event > from the queue. However, it is possible for a > CHR_EVENT_CLOSED event to be processed before before the bh > is scheduled, which will clear the queue without resuming > the monitor, thereby preventing the dispatcher from reaching > the resume() call. > Any new connections to the qmp socket will be accept()ed and > show the greeting, but will not respond to any messages sent > afterwards (as they will not be read from the > still-suspended socket). > Fix this by resuming the monitor when clearing a queue which > was filled up. > > Signed-off-by: Wolfgang Bumiller <w.bumil...@proxmox.com> > --- > Changes to v1: > * Update commit message to include the resulting symptoms. > * Moved the resume code from `monitor_qmp_cleanup_req_queue_locked` to > `monitor_qmp_cleanup_queues` to avoid an unnecessary resume when > destroying the monitor (as the `_locked` version is also used by > `monitor_data_destroy()`. > * Renamed `monitor_qmp_cleanup_queues` to > `monitor_qmp_cleanup_queues_and_resume` to reflect the change and be > verbose about it for potential future users of the function. > Currently the only user is `monitor_qmp_event()` in the > `CHR_EVENT_CLOSED` case, which is exactly the problematic case currently. > > Sorry for the deleay :|
Same to you (my sorry excuse is KVM Forum). Now we need to hurry to get this fix into 4.2. Let's try. > monitor/qmp.c | 24 ++++++++++++++++++++++-- > 1 file changed, 22 insertions(+), 2 deletions(-) > > diff --git a/monitor/qmp.c b/monitor/qmp.c > index 9d9e5d8b27..df689aa95e 100644 > --- a/monitor/qmp.c > +++ b/monitor/qmp.c > @@ -75,10 +75,30 @@ static void > monitor_qmp_cleanup_req_queue_locked(MonitorQMP *mon) > } > } > > -static void monitor_qmp_cleanup_queues(MonitorQMP *mon) > +static void monitor_qmp_cleanup_queues_and_resume(MonitorQMP *mon) Let's rename to _cleanup_queue_and resume(). The plural is a remnant from when we also had a response queue. Gone since commit 27656018d86. > { > qemu_mutex_lock(&mon->qmp_queue_lock); > + > + /* > + * Same condition as in monitor_qmp_bh_dispatcher(), but before removing > an > + * element from the queue (hence no `- 1`), also, the queue should not be > + * empty either, otherwise the monitor hasn't been suspended yet (or was > + * already resumed). > + */ Comment lines should be wrapped around colum 70. > + bool need_resume = (!qmp_oob_enabled(mon) && mon->qmp_requests->length > > 0) > + || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX; Now let me digest the comment. Here's condition in monitor_qmp_bh_dispatcher(): need_resume = !qmp_oob_enabled(mon) || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1; "Same but before removing" is bool need_resume = !qmp_oob_enabled(mon) || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX; That leaves the "also" part. It's been too long since v1, I don't remember a thing, and I'm too dense today to understand without more help. Can you help me some more? > + > monitor_qmp_cleanup_req_queue_locked(mon); > + > + if (need_resume) { > + /* > + * Pairs with the monitor_suspend() in handle_qmp_command() in case > the > + * queue gets cleared from a CH_EVENT_CLOSED event before the > dispatch > + * bh got scheduled. > + */ CHR_EVENT_CLOSED Suggest: /* * handle_qmp_command() suspened the monitor because the * request queue filled up, to be resumed when the queue has * space again. We just emptied it; resume the monitor. */ If we want to record the issue that made us fix the missing resume, we can add: * Without this, the monitor would remain suspended forever * when we get here while the monitor is suspended. An * unfortunately times CHR_EVENT_CLOSED can do the trick. Also update handle_qmp_command()'s comment: /* * Suspend the monitor when we can't queue more requests after * this one. Dequeuing in monitor_qmp_bh_dispatcher() or * monitor_qmp_cleanup_queue_and_resume() will resume it. * Note that when OOB is disabled, we queue at most one command, * for backward compatibility. */ > + monitor_resume(&mon->common); > + } > + > qemu_mutex_unlock(&mon->qmp_queue_lock); > } > > @@ -332,7 +352,7 @@ static void monitor_qmp_event(void *opaque, int event) > * stdio, it's possible that stdout is still open when stdin > * is closed. > */ > - monitor_qmp_cleanup_queues(mon); > + monitor_qmp_cleanup_queues_and_resume(mon); > json_message_parser_destroy(&mon->parser); > json_message_parser_init(&mon->parser, handle_qmp_command, > mon, NULL);