On Fri, Jul 06, 2018 at 10:09:58AM +0200, Markus Armbruster wrote: > If I understand the code correctly, the response queue > mon->qmp..qmp_requests gets drained into the output buffer mon->outbuf > by bottom half monitor_qmp_bh_responder(). The response queue remains > small, it's the output buffer that grows without bounds. Your test for > "limit exceeded" measures the response queue. It needs to measure the > output buffer instead. > > We could drain the response queue only when the output buffer isn't too > full. I think that makes sense only if we have a compelling use for > keeping responses in QDict form for a longer time.
Indeed. I didn't really notice that we're having an unlimited out_buf there. To double confirm it's working, I firstly added some tracepoints: ============= diff --git a/monitor.c b/monitor.c index 9eb9f06599..18ab609eab 100644 --- a/monitor.c +++ b/monitor.c @@ -375,6 +375,7 @@ static void monitor_qmp_cleanup_req_queue_locked(Monitor *mon) while (!g_queue_is_empty(mon->qmp.qmp_requests)) { qmp_request_free(g_queue_pop_head(mon->qmp.qmp_requests)); } + trace_monitor_qmp_request_queue(mon, 0); } /* Caller must hold the mon->qmp.qmp_lock */ @@ -383,6 +384,7 @@ static void monitor_qmp_cleanup_resp_queue_locked(Monitor *mon) while (!g_queue_is_empty(mon->qmp.qmp_responses)) { qobject_unref((QDict *)g_queue_pop_head(mon->qmp.qmp_responses)); } + trace_monitor_qmp_response_queue(mon, 0); } static void monitor_qmp_cleanup_queues(Monitor *mon) @@ -408,6 +410,7 @@ static void monitor_qmp_try_resume_locked(Monitor *mon) if (mon->qmp.need_resume) { monitor_resume(mon); mon->qmp.need_resume = false; + trace_monitor_qmp_resume(mon); } } @@ -555,6 +558,7 @@ static void qmp_queue_response(Monitor *mon, QDict *rsp) */ qemu_mutex_lock(&mon->qmp.qmp_lock); g_queue_push_tail(mon->qmp.qmp_responses, qobject_ref(rsp)); + trace_monitor_qmp_response_queue(mon, mon->qmp.qmp_responses->length); qemu_mutex_unlock(&mon->qmp.qmp_lock); qemu_bh_schedule(qmp_respond_bh); } else { @@ -578,6 +582,7 @@ static QDict *monitor_qmp_response_pop_one(Monitor *mon) qemu_mutex_lock(&mon->qmp.qmp_lock); data = g_queue_pop_head(mon->qmp.qmp_responses); + trace_monitor_qmp_response_queue(mon, mon->qmp.qmp_responses->length); /* In case if we were suspended due to response queue full */ monitor_qmp_try_resume_locked(mon); qemu_mutex_unlock(&mon->qmp.qmp_lock); @@ -4183,6 +4188,7 @@ static QMPRequest *monitor_qmp_requests_pop_any(void) QTAILQ_FOREACH(mon, &mon_list, entry) { qemu_mutex_lock(&mon->qmp.qmp_lock); req_obj = g_queue_pop_head(mon->qmp.qmp_requests); + trace_monitor_qmp_request_queue(mon, mon->qmp.qmp_requests->length); qemu_mutex_unlock(&mon->qmp.qmp_lock); if (req_obj) { break; @@ -4239,6 +4245,7 @@ static void monitor_qmp_suspend_locked(Monitor *mon) assert(mon->qmp.need_resume == false); monitor_suspend(mon); mon->qmp.need_resume = true; + trace_monitor_qmp_suspend(mon); } static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens) @@ -4312,6 +4319,7 @@ static void handle_qmp_command(JSONMessageParser *parser, GQueue *tokens) * etc. will be delivered to the handler side. */ g_queue_push_tail(mon->qmp.qmp_requests, req_obj); + trace_monitor_qmp_request_queue(mon, mon->qmp.qmp_requests->length); qemu_mutex_unlock(&mon->qmp.qmp_lock); /* Kick the dispatcher routine */ diff --git a/trace-events b/trace-events index c445f54773..bd9dade938 100644 --- a/trace-events +++ b/trace-events @@ -50,6 +50,10 @@ handle_qmp_command(void *mon, const char *req) "mon %p req: %s" monitor_suspend(void *ptr, int cnt) "mon %p: %d" monitor_qmp_cmd_in_band(const char *id) "%s" monitor_qmp_cmd_out_of_band(const char *id) "%s" +monitor_qmp_suspend(void *mon) "mon=%p" +monitor_qmp_resume(void *mon) "mon=%p" +monitor_qmp_request_queue(void *mon, int len) "mon=%p len=%d" +monitor_qmp_response_queue(void *mon, int len) "mon=%p len=%d" # dma-helpers.c dma_blk_io(void *dbs, void *bs, int64_t offset, bool to_dev) "dbs=%p bs=%p offset=%" PRId64 " to_dev=%d" ============= Then, I explicitly hanged the response BH by: ============= diff --git a/monitor.c b/monitor.c index 18ab609eab..1f38084a4c 100644 --- a/monitor.c +++ b/monitor.c @@ -560,7 +560,7 @@ static void qmp_queue_response(Monitor *mon, QDict *rsp) g_queue_push_tail(mon->qmp.qmp_responses, qobject_ref(rsp)); trace_monitor_qmp_response_queue(mon, mon->qmp.qmp_responses->length); qemu_mutex_unlock(&mon->qmp.qmp_lock); - qemu_bh_schedule(qmp_respond_bh); + //qemu_bh_schedule(qmp_respond_bh); } else { /* * Not using monitor I/O thread, i.e. we are in the main thread. ============= Then this is what I got for command: $ cat cmd_list | qemu-system-x86_64 -qmp stdio -nographic -nodefaults -trace enable="monitor_qmp*" Result: 9815@1530867603.464498:monitor_qmp_response_queue mon=0x55de6bfc4800 len=1 9815@1530867603.464782:monitor_qmp_suspend mon=0x55de6bfc4800 9815@1530867603.464788:monitor_qmp_request_queue mon=0x55de6bfc4800 len=1 9815@1530867603.489017:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489030:monitor_qmp_cmd_in_band 9815@1530867603.489050:monitor_qmp_response_queue mon=0x55de6bfc4800 len=2 9815@1530867603.489057:monitor_qmp_resume mon=0x55de6bfc4800 <------- [0] 9815@1530867603.489099:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489205:monitor_qmp_request_queue mon=0x55de6bfc4800 len=1 9815@1530867603.489311:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489317:monitor_qmp_cmd_in_band 9815@1530867603.489329:monitor_qmp_request_queue mon=0x55de6bfc4800 len=1 9815@1530867603.489419:monitor_qmp_response_queue mon=0x55de6bfc4800 len=3 9815@1530867603.489432:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489437:monitor_qmp_cmd_in_band 9815@1530867603.489450:monitor_qmp_response_queue mon=0x55de6bfc4800 len=4 9815@1530867603.489459:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489465:monitor_qmp_request_queue mon=0x55de6bfc4800 len=1 9815@1530867603.489468:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489471:monitor_qmp_cmd_in_band 9815@1530867603.489484:monitor_qmp_response_queue mon=0x55de6bfc4800 len=5 9815@1530867603.489492:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489575:monitor_qmp_request_queue mon=0x55de6bfc4800 len=1 9815@1530867603.489584:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489595:monitor_qmp_cmd_in_band 9815@1530867603.489614:monitor_qmp_response_queue mon=0x55de6bfc4800 len=6 9815@1530867603.489624:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489693:monitor_qmp_request_queue mon=0x55de6bfc4800 len=1 9815@1530867603.489703:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489708:monitor_qmp_cmd_in_band 9815@1530867603.489725:monitor_qmp_response_queue mon=0x55de6bfc4800 len=7 9815@1530867603.489736:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489818:monitor_qmp_suspend mon=0x55de6bfc4800 <-------------[1] 9815@1530867603.489823:monitor_qmp_request_queue mon=0x55de6bfc4800 len=1 9815@1530867603.489837:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 9815@1530867603.489847:monitor_qmp_cmd_in_band 9815@1530867603.489874:monitor_qmp_response_queue mon=0x55de6bfc4800 len=8 9815@1530867603.489892:monitor_qmp_request_queue mon=0x55de6bfc4800 len=0 [hangs here] I think we have [1] right after response queue reaches N-1, I suppose that means current patch is working. Note that the suspend at [0] is only the first "qmp_capability" command, since before we enable OOB we'll still emulate the old monitor so that's by design. > > Marc-André has a patch to cut out the response queue. Whether it still > makes sense I don't know. > [PATCH v3 03/38] Revert "qmp: isolate responses into io thread" I think we discussed this before, I agree with Marc-Andre that most of the codes that are related to response flushing should be thread safe now. Actually it was not when we started to draft the out-of-band thing, and that's the major reason why I refactored my old code to explicitly separate the dispatcher and the responder, so that responder can be isolated and be together with the parser. I don't have obvious reason to remove this layer of isolation if it works well (and actually the code is not that much, and IMHO the response queue is a clear interface that maybe we can use in the future too), I think my major concern now is that we're in rc phase for QEMU-3.0 so that it at least seems to be a big change to monitor layer. We might consider to revert back to no-responder way if we really want, but I guess we'd better do as much testing as when we started to play with out-of-band to make sure we won't miss anything important (and I guess the patch will need a non-trivial rebase after we worked upon the queues recently). Considering that, I don't really see much help on removing that layer. Or, do we have any other reason to remove the response queue that I'm not aware of? Thanks, -- Peter Xu