Re: [Qemu-devel] [PATCH v7 3/7] monitor: suspend monitor instead of send CMD_DROP

Peter Xu Mon, 03 Sep 2018 20:34:00 -0700

On Mon, Sep 03, 2018 at 03:16:55PM +0200, Markus Armbruster wrote:
> Peter Xu <pet...@redhat.com> writes:
> 
> > On Mon, Sep 03, 2018 at 09:38:00AM +0200, Markus Armbruster wrote:
> >> Peter Xu <pet...@redhat.com> writes:
> >> 
> >> > When we received too many qmp commands, previously we'll send
> >> > COMMAND_DROPPED events to monitors, then we'll drop the requests.  Now
> >> > instead of dropping the command we stop reading from input when we
> >> > notice the queue is getting full.  Note that here since we removed the
> >> > need_resume flag we need to be _very_ careful on the suspend/resume
> >> > paring on the conditions since unmatched condition checks will hang
> >> > death the monitor.  Meanwhile, now we will need to read the queue length
> >> > to decide whether we'll need to resume the monitor so we need to take
> >> > the queue lock again even after popping from it.
> >> >
> >> > Signed-off-by: Peter Xu <pet...@redhat.com>
> >> 
> >> The subject's "send CMD_DROP" fails to highlight the important part:
> >> we're no longer dropping commands.  Moreover, the commit message could
> >> do a better job explaining the tradeoffs.  Here's my try:
> >> 
> >>     monitor: Suspend monitor instead dropping commands
> >> 
> >>     When a QMP client sends in-band commands more quickly that we can
> >>     process them, we can either queue them without limit (QUEUE), drop
> >>     commands when the queue is full (DROP), or suspend receiving commands
> >>     when the queue is full (SUSPEND).  None of them is ideal:
> >> 
> >>     * QUEUE lets a misbehaving client make QEMU eat memory without bounds.
> >>       Not such a hot idea.
> >> 
> >>     * With DROP, the client has to cope with dropped in-band commands.  To
> >>       inform the client, we send a COMMAND_DROPPED event then.  The event 
> >> is
> >>       flawed by design in two ways: it's ambiguous (see commit 
> >> d621cfe0a17),
> >>       and it brings back the "eat memory without bounds" problem.
> >> 
> >>     * With SUSPEND, the client has to manage the flow of in-band commands 
> >> to
> >>       keep the monitor available for out-of-band commands.
> >> 
> >>     We currently DROP.  Switch to SUSPEND.
> >> 
> >>     Managing the flow of in-band commands to keep the monitor available for
> >>     out-of-band commands isn't really hard: just count the number of
> >>     "outstanding" in-band commands (commands sent minus replies received),
> >>     and if it exceeds the limit, hold back additional ones until it drops
> >>     below the limit again.
> >
> > (Will the simplest QMP client just block at the write() system call
> >  without notice?
> 
> Yes.
> 
> When you stop reading from a socket or pipe, the peer eventually can't
> write more.  "Eventually", because the TCP connection or pipe buffers
> some.
> 
> A naive client using a blocking write() will block then.
> 
> A slightly more sophisticated client using a non-blocking write() has
> its write() fail with EAGAIN or EWOULDBLOCK.
> 
> In both cases, OOB commands may be stuck in the TCP connection's /
> pipe's buffer.
> 
> 
> >                   Anyway, the SUSPEND is ideal enough to me... :)
> >
> >> 
> >>     Note that we need to be careful pairing the suspend with a resume, or
> >>     else the monitor will hang, possibly forever.
> >
> > Will take your version, thanks.
> >
> >> 
> >> 
> >> > ---
> >> >  monitor.c | 33 +++++++++++++++------------------
> >> >  1 file changed, 15 insertions(+), 18 deletions(-)
> >> >
> >> > diff --git a/monitor.c b/monitor.c
> >> > index 3b90c9eb5f..9e6cad2af6 100644
> >> > --- a/monitor.c
> >> > +++ b/monitor.c
> >> > @@ -89,6 +89,8 @@
> >> >  #include "hw/s390x/storage-attributes.h"
> >> >  #endif
> >> >  
> >> > +#define  QMP_REQ_QUEUE_LEN_MAX  (8)
> >> > +
> >> 
> >> Let's drop the parenthesis.
> >
> > Ok.
> >
> >> 
> >> >  /*
> >> >   * Supported types:
> >> >   *
> >> > @@ -4124,29 +4126,33 @@ static QMPRequest 
> >> > *monitor_qmp_requests_pop_any(void)
> >> >  static void monitor_qmp_bh_dispatcher(void *data)
> >> >  {
> >> >      QMPRequest *req_obj = monitor_qmp_requests_pop_any();
> >> > +    Monitor *mon;
> >> >      QDict *rsp;
> >> >      bool need_resume;
> >> >  
> >> >      if (!req_obj) {
> >> >          return;
> >> >      }
> >> > -
> >> 
> >> Let's keep the blank line.
> >
> > Ok.
> >
> >> 
> >> > +    mon = req_obj->mon;
> >> >      /*  qmp_oob_enabled() might change after "qmp_capabilities" */
> >> > -    need_resume = !qmp_oob_enabled(req_obj->mon);
> >> > +    qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
> >> > +    need_resume = !qmp_oob_enabled(req_obj->mon) ||
> >> > +        mon->qmp.qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
> >> 
> >> Note for later: this is the condition guarding monitor_resume().
> >> 
> >> Is this race-free?  We have two criticial sections, one in
> >> monitor_qmp_requests_pop_any(), and one here.  What if two threads
> >> interleave like this
> >> 
> >>     thread 1                            thread 2
> >>     ------------------------------------------------------------------
> >>     monitor_qmp_requests_pop_any()
> >>         lock
> >>         dequeue
> >>         unlock
> >>                                         monitor_qmp_requests_pop_any()
> >>                                             lock
> >>                                             dequeue
> >>                                             unlock
> >>                                         lock
> >>                                         check queue length
> >>                                         unlock
> >>                                         if queue length demands it
> >>                                              monitor_resume() 
> >>     lock
> >>     check queue length
> >>     unlock
> >>     if queue length demands it
> >>         monitor_resume()
> >> 
> >> Looks racy to me: if we start with the queue full (and the monitor
> >> suspended), both threads' check queue length see length
> >> QMP_REQ_QUEUE_LEN_MAX - 2, and neither thread resumes the monitor.
> >> Oops.
> >> 
> >> Simplest fix is probably moving the resume into
> >> monitor_qmp_requests_pop_any()'s critical section.
> >
> > But we only have one QMP dispatcher, right?  So IMHO we can't have two
> > threads doing monitor_qmp_requests_pop_any() at the same time.
> 
> You're right, we currently run monitor_qmp_bh_dispatcher() only in the
> main thread, and a thread can't race with itself.  But the main thread
> can still race with handle_qmp_command() running in mon_iothread.
> 
>     main thread                         mon_iothread
>     ------------------------------------------------------------------
>     monitor_qmp_requests_pop_any()
>         lock
>         dequeue
>         unlock
>                                         lock
>                                         if queue length demands it
>                                             monitor_suspend()
>                                         push onto queue
>                                         unlock
>     lock
>     check queue length
>     unlock
>     if queue length demands it
>         monitor_resume()           <---------------------- [1]
> 
> If we start with the queue full (and the monitor suspended), the main
> thread first determines it'll need to resume.  mon_iothread then fills
> the queue again, and suspends the suspended monitor some more.  If I


(Side note: here it's tricky that when we "determines to resume" we
 didn't really resume, so we are still suspended, hence the
 mon_iothread cannot fill the queue again until the resume at [1]
 above.  Hence IMHO we're safe here.  But I agree that it's still racy
 in other cases.)

> read the code correctly, this increments mon->suspend_cnt from 1 to 2.
> Finally, the main thread checks the queue length:
> 
>        need_resume = !qmp_oob_enabled(req_obj->mon) ||
>            mon->qmp.qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
> 
> The yields false, because ->length is now QMP_REQ_QUEUE_LEN_MAX.  The
> main thread does *not* resume the monitor.
> 
> State after this: queue full, mon->suspend_cnt == 2.  Bad, but we'll
> recover on the dequeue after next: the next dequeue decrements
> mon->suspend_cnt to 1 without resuming the monitor, the one after that
> will decrement it to 0 and resume the monitor.
> 
> However, if handle_qmp_command() runs in this spot often enough to push
> mon->suspend_cnt above QMP_REQ_QUEUE_LEN_MAX, the monitor will remain
> suspended forever.
> 
> I'm teaching you multiprogramming 101 here.  The thing that should make
> the moderately experienced nose twitch is the anti-pattern
> 
>     begin critical section
>     do something
>     end critical section
>     begin critical section
>     if we just did X, state must be Y, so we must now do Z
>     end critical section
> 
> The part "if we just did X, state must be Y" is *wrong*.  You have no
> idea what the state is, since code running between the two critical
> sections may have changed it.
> 
> Here,
> 
>     X = dequeued from a full queue"
>     Y = "mon->suspend_cnt == 1"
>     Z = monitor_resume() to resume the monitor
> 
> I showed that Y does not hold.
> 
> On a slightly more abstract level, this anti-pattern applies:
> 
>     begin critical section
>     start messing with invariant
>     end critical section
>     // invariant does not hold here, oopsie
>     begin critical section
>     finish messing with invariant
>     end critical section
> 
> The invariant here is "monitor suspended exactly when the queue is
> full".  Your first critical section can make the queue non-full.  The
> second one resumes.  The invariant doesn't hold in between, and all bets
> are off.
> 
> To maintain the invariant, you *have* to enqueue and suspend in a single
> critical section (which you do), *and* dequeue and resume in a single
> critical section (which you don't).

Thank you for teaching the lesson for me.

> 
> > But I fully agree that it'll be nicer to keep them together (e.g. in
> > case we allow to dispatch two commands some day), but then if you see
> > it'll be something like the old req_obj->need_resume flag, but we set
> > it in monitor_qmp_requests_pop_any() instead.  If so, I really prefer
> > we just keep that need_resume flag for per-monitor by dropping
> > 176160ce78 and keep my patch to move that flag into monitor struct
> > (which I dropped after the rebase of this version).
> 
> Please have another look.

Again, if we want to put pop+check into the same critical section,
then we possibly need to return something from the
monitor_qmp_requests_pop_any() to show the queue length information,
then I would prefer to keep the per-monitor need_resume flag.

What do you think?

> 
> >> > +    qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> >> >      if (req_obj->req) {
> >> >          trace_monitor_qmp_cmd_in_band(qobject_get_try_str(req_obj->id) 
> >> > ?: "");
> >> > -        monitor_qmp_dispatch(req_obj->mon, req_obj->req, req_obj->id);
> >> > +        monitor_qmp_dispatch(mon, req_obj->req, req_obj->id);
> >> >      } else {
> >> >          assert(req_obj->err);
> >> >          rsp = qmp_error_response(req_obj->err);
> >> >          req_obj->err = NULL;
> >> > -        monitor_qmp_respond(req_obj->mon, rsp, NULL);
> >> > +        monitor_qmp_respond(mon, rsp, NULL);
> >> >          qobject_unref(rsp);
> >> >      }
> >> >  
> >> >      if (need_resume) {
> >> >          /* Pairs with the monitor_suspend() in handle_qmp_command() */
> >> > -        monitor_resume(req_obj->mon);
> >> > +        monitor_resume(mon);
> >> >      }
> >> >      qmp_request_free(req_obj);
> >> 
> >> The replacement of req_obj->mon by mon makes this change less clear than
> >> it could be.  Let's figure out the possible race, and then we'll see
> >> whether we'll still want this replacement.
> >
> > Sure.
> >
> >> 
> >> >  
> >> > @@ -4154,8 +4160,6 @@ static void monitor_qmp_bh_dispatcher(void *data)
> >> >      qemu_bh_schedule(qmp_dispatcher_bh);
> >> >  }
> >> >  
> >> > -#define  QMP_REQ_QUEUE_LEN_MAX  (8)
> >> > -
> >> >  static void handle_qmp_command(void *opaque, QObject *req, Error *err)
> >> >  {
> >> >      Monitor *mon = opaque;
> >> > @@ -4205,19 +4209,12 @@ static void handle_qmp_command(void *opaque, 
> >> > QObject *req, Error *err)
> >>        /*
> >>         * If OOB is not enabled on the current monitor, we'll emulate the
> >>         * old behavior that we won't process the current monitor any more
> >>         * until it has responded.  This helps make sure that as long as
> >>         * OOB is not enabled, the server will never drop any command.
> >>         */
> >> 
> >> This comment is now stale, we don't drop commands anymore.
> >
> > AFAIU it's describing [1] below but nothing related to COMMAND_DROP?
> 
> The comment suggests the server can drop commands when OOB is enabled.
> This is no longer correct after this patch.

Ah yes the last sentense, sorry.

No matter what, I'm taking your suggestion below so the paragraph will
be dropped.

Thanks,

> 
> >> 
> >> >      if (!qmp_oob_enabled(mon)) {
> >> >          monitor_suspend(mon);
> >
> > [1]
> >
> >> >      } else {
> >> > -        /* Drop the request if queue is full. */
> >> > -        if (mon->qmp.qmp_requests->length >= QMP_REQ_QUEUE_LEN_MAX) {
> >> > -            qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> >> > +        if (mon->qmp.qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1) 
> >> > {
> >> >              /*
> >> > -             * FIXME @id's scope is just @mon, and broadcasting it is
> >> > -             * wrong.  If another monitor's client has a command with
> >> > -             * the same ID in flight, the event will incorrectly claim
> >> > -             * that command was dropped.
> >> > +             * If this is _exactly_ the last request that we can
> >> > +             * queue, we suspend the monitor right now.
> >> >               */
> >> > -            qapi_event_send_command_dropped(id,
> >> > -                                            
> >> > COMMAND_DROP_REASON_QUEUE_FULL);
> >> > -            qmp_request_free(req_obj);
> >> > -            return;
> >> > +            monitor_suspend(mon);
> >> >          }
> >> >      }
> >> 
> >> The conditional and its comments look unbalanced.  Moreover, the
> >> condition is visually dissimilar to the one guarding the matching
> >> monitor_resume().  What about:
> >> 
> >>        /*
> >>         * Suspend the monitor when we can't queue more requests after
> >>         * this one.  Dequeuing in monitor_qmp_bh_dispatcher() will resume
> >>         * it.
> >>         * Note that when OOB is disabled, we queue at most one command,
> >>         * for backward compatibility.
> >>         */
> >>        if (!qmp_oob_enabled(mon) ||
> >>            mon->qmp.qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1) {
> >>            monitor_suspend(mon);
> >>        }
> >> 
> >> You'll have to update the comment if you move the resume to
> >> monitor_qmp_requests_pop_any().
> >> 
> >> Next:
> >> 
> >>        /*
> >>         * Put the request to the end of queue so that requests will be
> >>         * handled in time order.  Ownership for req_obj, req, id,
> >>         * etc. will be delivered to the handler side.
> >>         */
> >> 
> >> Suggest asserting mon->qmp.qmp_requests->length < QMP_REQ_QUEUE_LEN_MAX
> >> here.
> >
> > Sure I can do this.
> >
> >> 
> >>        g_queue_push_tail(mon->qmp.qmp_requests, req_obj);
> >>        qemu_mutex_unlock(&mon->qmp.qmp_queue_lock);
> >
> > Thanks,

Regards,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v7 3/7] monitor: suspend monitor instead of send CMD_DROP

Reply via email to