Re: [Qemu-devel] [PATCH v7 3/7] monitor: suspend monitor instead of send CMD_DROP

Peter Xu Tue, 04 Sep 2018 00:02:29 -0700

On Tue, Sep 04, 2018 at 08:17:54AM +0200, Markus Armbruster wrote:
> Peter Xu <pet...@redhat.com> writes:
> 
> > On Mon, Sep 03, 2018 at 03:16:55PM +0200, Markus Armbruster wrote:
> >> Peter Xu <pet...@redhat.com> writes:
> >> 
> >> > On Mon, Sep 03, 2018 at 09:38:00AM +0200, Markus Armbruster wrote:
> >> >> Peter Xu <pet...@redhat.com> writes:
> >> >> 
> >> >> > When we received too many qmp commands, previously we'll send
> >> >> > COMMAND_DROPPED events to monitors, then we'll drop the requests.  Now
> >> >> > instead of dropping the command we stop reading from input when we
> >> >> > notice the queue is getting full.  Note that here since we removed the
> >> >> > need_resume flag we need to be _very_ careful on the suspend/resume
> >> >> > paring on the conditions since unmatched condition checks will hang
> >> >> > death the monitor.  Meanwhile, now we will need to read the queue 
> >> >> > length
> >> >> > to decide whether we'll need to resume the monitor so we need to take
> >> >> > the queue lock again even after popping from it.
> >> >> >
> >> >> > Signed-off-by: Peter Xu <pet...@redhat.com>
> >> >> 
> >> >> The subject's "send CMD_DROP" fails to highlight the important part:
> >> >> we're no longer dropping commands.  Moreover, the commit message could
> >> >> do a better job explaining the tradeoffs.  Here's my try:
> >> >> 
> >> >>     monitor: Suspend monitor instead dropping commands
> >> >> 
> >> >>     When a QMP client sends in-band commands more quickly that we can
> >> >>     process them, we can either queue them without limit (QUEUE), drop
> >> >>     commands when the queue is full (DROP), or suspend receiving 
> >> >> commands
> >> >>     when the queue is full (SUSPEND).  None of them is ideal:
> >> >> 
> >> >>     * QUEUE lets a misbehaving client make QEMU eat memory without 
> >> >> bounds.
> >> >>       Not such a hot idea.
> >> >> 
> >> >>     * With DROP, the client has to cope with dropped in-band commands.  
> >> >> To
> >> >>       inform the client, we send a COMMAND_DROPPED event then.  The 
> >> >> event is
> >> >>       flawed by design in two ways: it's ambiguous (see commit 
> >> >> d621cfe0a17),
> >> >>       and it brings back the "eat memory without bounds" problem.
> >> >> 
> >> >>     * With SUSPEND, the client has to manage the flow of in-band 
> >> >> commands to
> >> >>       keep the monitor available for out-of-band commands.
> >> >> 
> >> >>     We currently DROP.  Switch to SUSPEND.
> >> >> 
> >> >>     Managing the flow of in-band commands to keep the monitor available 
> >> >> for
> >> >>     out-of-band commands isn't really hard: just count the number of
> >> >>     "outstanding" in-band commands (commands sent minus replies 
> >> >> received),
> >> >>     and if it exceeds the limit, hold back additional ones until it 
> >> >> drops
> >> >>     below the limit again.
> >> >
> >> > (Will the simplest QMP client just block at the write() system call
> >> >  without notice?
> >> 
> >> Yes.
> >> 
> >> When you stop reading from a socket or pipe, the peer eventually can't
> >> write more.  "Eventually", because the TCP connection or pipe buffers
> >> some.
> >> 
> >> A naive client using a blocking write() will block then.
> >> 
> >> A slightly more sophisticated client using a non-blocking write() has
> >> its write() fail with EAGAIN or EWOULDBLOCK.
> >> 
> >> In both cases, OOB commands may be stuck in the TCP connection's /
> >> pipe's buffer.
> >> 
> >> 
> >> >                   Anyway, the SUSPEND is ideal enough to me... :)
> >> >
> >> >> 
> >> >>     Note that we need to be careful pairing the suspend with a resume, 
> >> >> or
> >> >>     else the monitor will hang, possibly forever.
> >> >
> >> > Will take your version, thanks.
> >> >
> >> >> 
> >> >> 
> >> >> > ---
> >> >> >  monitor.c | 33 +++++++++++++++------------------
> >> >> >  1 file changed, 15 insertions(+), 18 deletions(-)
> >> >> >
> >> >> > diff --git a/monitor.c b/monitor.c
> >> >> > index 3b90c9eb5f..9e6cad2af6 100644
> >> >> > --- a/monitor.c
> >> >> > +++ b/monitor.c
> >> >> > @@ -89,6 +89,8 @@
> >> >> >  #include "hw/s390x/storage-attributes.h"
> >> >> >  #endif
> >> >> >  
> >> >> > +#define  QMP_REQ_QUEUE_LEN_MAX  (8)
> >> >> > +
> >> >> 
> >> >> Let's drop the parenthesis.
> >> >
> >> > Ok.
> >> >
> >> >> 
> >> >> >  /*
> >> >> >   * Supported types:
> >> >> >   *
> >> >> > @@ -4124,29 +4126,33 @@ static QMPRequest 
> >> >> > *monitor_qmp_requests_pop_any(void)
> >> >> >  static void monitor_qmp_bh_dispatcher(void *data)
> >> >> >  {
> >> >> >      QMPRequest *req_obj = monitor_qmp_requests_pop_any();
> >> >> > +    Monitor *mon;
> >> >> >      QDict *rsp;
> >> >> >      bool need_resume;
> >> >> >  
> >> >> >      if (!req_obj) {
> >> >> >          return;
> >> >> >      }
> >> >> > -
> >> >> 
> >> >> Let's keep the blank line.
> >> >
> >> > Ok.
> >> >
> >> >> 
> >> >> > +    mon = req_obj->mon;
> >> >> >      /*  qmp_oob_enabled() might change after "qmp_capabilities" */
> >> >> > -    need_resume = !qmp_oob_enabled(req_obj->mon);
> >> >> > +    qemu_mutex_lock(&mon->qmp.qmp_queue_lock);
> >> >> > +    need_resume = !qmp_oob_enabled(req_obj->mon) ||
> >> >> > +        mon->qmp.qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
> >> >> 
> >> >> Note for later: this is the condition guarding monitor_resume().
> >> >> 
> >> >> Is this race-free?  We have two criticial sections, one in
> >> >> monitor_qmp_requests_pop_any(), and one here.  What if two threads
> >> >> interleave like this
> >> >> 
> >> >>     thread 1                            thread 2
> >> >>     ------------------------------------------------------------------
> >> >>     monitor_qmp_requests_pop_any()
> >> >>         lock
> >> >>         dequeue
> >> >>         unlock
> >> >>                                         monitor_qmp_requests_pop_any()
> >> >>                                             lock
> >> >>                                             dequeue
> >> >>                                             unlock
> >> >>                                         lock
> >> >>                                         check queue length
> >> >>                                         unlock
> >> >>                                         if queue length demands it
> >> >>                                              monitor_resume() 
> >> >>     lock
> >> >>     check queue length
> >> >>     unlock
> >> >>     if queue length demands it
> >> >>         monitor_resume()
> >> >> 
> >> >> Looks racy to me: if we start with the queue full (and the monitor
> >> >> suspended), both threads' check queue length see length
> >> >> QMP_REQ_QUEUE_LEN_MAX - 2, and neither thread resumes the monitor.
> >> >> Oops.
> >> >> 
> >> >> Simplest fix is probably moving the resume into
> >> >> monitor_qmp_requests_pop_any()'s critical section.
> >> >
> >> > But we only have one QMP dispatcher, right?  So IMHO we can't have two
> >> > threads doing monitor_qmp_requests_pop_any() at the same time.
> >> 
> >> You're right, we currently run monitor_qmp_bh_dispatcher() only in the
> >> main thread, and a thread can't race with itself.  But the main thread
> >> can still race with handle_qmp_command() running in mon_iothread.
> >> 
> >>     main thread                         mon_iothread
> >>     ------------------------------------------------------------------
> >>     monitor_qmp_requests_pop_any()
> >>         lock
> >>         dequeue
> >>         unlock
> >>                                         lock
> >>                                         if queue length demands it
> >>                                             monitor_suspend()
> >>                                         push onto queue
> >>                                         unlock
> >>     lock
> >>     check queue length
> >>     unlock
> >>     if queue length demands it
> >>         monitor_resume()           <---------------------- [1]
> >> 
> >> If we start with the queue full (and the monitor suspended), the main
> >> thread first determines it'll need to resume.  mon_iothread then fills
> >> the queue again, and suspends the suspended monitor some more.  If I
> >
> > (Side note: here it's tricky that when we "determines to resume" we
> >  didn't really resume, so we are still suspended, hence the
> >  mon_iothread cannot fill the queue again until the resume at [1]
> 
> You're right.
> 
> >  above.  Hence IMHO we're safe here.  But I agree that it's still racy
> >  in other cases.)
> 
> Maybe.
> 
> Getting threads to cooperate safely is hard.  Key to success is making
> things as simple and obvious as we can.  Suitable invariants help.
> 
> They help even more when they're documented and checked with assertions.
> Perhaps you can think of ways to do that.
> 
> >> read the code correctly, this increments mon->suspend_cnt from 1 to 2.
> >> Finally, the main thread checks the queue length:
> >> 
> >>        need_resume = !qmp_oob_enabled(req_obj->mon) ||
> >>            mon->qmp.qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
> >> 
> >> The yields false, because ->length is now QMP_REQ_QUEUE_LEN_MAX.  The
> >> main thread does *not* resume the monitor.
> >> 
> >> State after this: queue full, mon->suspend_cnt == 2.  Bad, but we'll
> >> recover on the dequeue after next: the next dequeue decrements
> >> mon->suspend_cnt to 1 without resuming the monitor, the one after that
> >> will decrement it to 0 and resume the monitor.
> >> 
> >> However, if handle_qmp_command() runs in this spot often enough to push
> >> mon->suspend_cnt above QMP_REQ_QUEUE_LEN_MAX, the monitor will remain
> >> suspended forever.
> >> 
> >> I'm teaching you multiprogramming 101 here.  The thing that should make
> >> the moderately experienced nose twitch is the anti-pattern
> >> 
> >>     begin critical section
> >>     do something
> >>     end critical section
> >>     begin critical section
> >>     if we just did X, state must be Y, so we must now do Z
> >>     end critical section
> >> 
> >> The part "if we just did X, state must be Y" is *wrong*.  You have no
> >> idea what the state is, since code running between the two critical
> >> sections may have changed it.
> >> 
> >> Here,
> >> 
> >>     X = dequeued from a full queue"
> >>     Y = "mon->suspend_cnt == 1"
> >>     Z = monitor_resume() to resume the monitor
> >> 
> >> I showed that Y does not hold.
> >> 
> >> On a slightly more abstract level, this anti-pattern applies:
> >> 
> >>     begin critical section
> >>     start messing with invariant
> >>     end critical section
> >>     // invariant does not hold here, oopsie
> >>     begin critical section
> >>     finish messing with invariant
> >>     end critical section
> >> 
> >> The invariant here is "monitor suspended exactly when the queue is
> >> full".  Your first critical section can make the queue non-full.  The
> >> second one resumes.  The invariant doesn't hold in between, and all bets
> >> are off.
> >> 
> >> To maintain the invariant, you *have* to enqueue and suspend in a single
> >> critical section (which you do), *and* dequeue and resume in a single
> >> critical section (which you don't).
> >
> > Thank you for teaching the lesson for me.
> >
> >> 
> >> > But I fully agree that it'll be nicer to keep them together (e.g. in
> >> > case we allow to dispatch two commands some day), but then if you see
> >> > it'll be something like the old req_obj->need_resume flag, but we set
> >> > it in monitor_qmp_requests_pop_any() instead.  If so, I really prefer
> >> > we just keep that need_resume flag for per-monitor by dropping
> >> > 176160ce78 and keep my patch to move that flag into monitor struct
> >> > (which I dropped after the rebase of this version).
> >> 
> >> Please have another look.
> >
> > Again, if we want to put pop+check into the same critical section,
> > then we possibly need to return something from the
> > monitor_qmp_requests_pop_any() to show the queue length information,
> > then I would prefer to keep the per-monitor need_resume flag.
> >
> > What do you think?
> 
> Options:
> 
> * Save rather than recompute @need_resume
> 
>   This gets rid of the second critical section.


This one I always liked.  Actually it will avoid potential risk when
the conditions become more complicated (now it only contains two
checks).  Meanwhile...

> 
> * Have monitor_qmp_requests_pop_any() return @need_resume
> 
>   This merges the second critical section into the first.

(I don't like this one much...)

> 
> * Have a single critical section in monitor_qmp_bh_dispatcher()
> 
>   This replaces both critical sections by a single larger one.
>   Callers of monitor_qmp_bh_dispatcher() must hold monitor_lock.

... I like this one too since it's clean enough at least for now and
it won't bother you to drop one queued patch in monitor-next.

I'll use this one if no one disagrees.

> 
> * Other ideas?
> 
> The question to ask regardless of option: what invariant do the critical
> sections maintain?
> 
> I proposed one: monitor suspended exactly when the queue is full.
> 
> handle_qmp_command()'s critical section maintains it: it suspends when
> its enqueue fills up the queue.
> 
> monitor_qmp_bh_dispatcher()'s critical sections do not.  Even if we
> reduce them from two to one, the resulting single critical section can't
> as long as we resume only after the dequeued command completed, because
> that would require holding monitor_lock while the command executes.  So,
> to maintain this invariant, we need to rethink
> monitor_qmp_bh_dispatcher().  Why can't we resume right after dequeuing?

Makes sense.  I can do that in my next post if you like.

> 
> You're of course welcome to propose a different invariant.
> 
> [...]

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH v7 3/7] monitor: suspend monitor instead of send CMD_DROP

Reply via email to