Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread

Dr. David Alan Gilbert Wed, 06 Sep 2017 04:32:29 -0700

* Daniel P. Berrange (berra...@redhat.com) wrote:
> On Wed, Sep 06, 2017 at 11:57:05AM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (berra...@redhat.com) wrote:
> > > On Wed, Sep 06, 2017 at 11:48:51AM +0100, Dr. David Alan Gilbert wrote:
> > > > * Daniel P. Berrange (berra...@redhat.com) wrote:
> > > > > On Wed, Sep 06, 2017 at 10:48:46AM +0100, Dr. David Alan Gilbert 
> > > > > wrote:
> > > > > > * Daniel P. Berrange (berra...@redhat.com) wrote:
> > > > > > > On Wed, Aug 23, 2017 at 02:51:03PM +0800, Peter Xu wrote:
> > > > > > > > v2:
> > > > > > > > - fixed "make check" error that patchew reported
> > > > > > > > - moved the thread_join upper in monitor_data_destroy(), before
> > > > > > > >   resources are released
> > > > > > > > - added one new patch (current patch 3) that fixes a nasty risk
> > > > > > > >   condition with IOWatchPoll.  Please see commit message for 
> > > > > > > > more
> > > > > > > >   information.
> > > > > > > > - added a g_main_context_wakeup() to make sure the separate loop
> > > > > > > >   thread can be kicked always when we want to destroy the 
> > > > > > > > per-monitor
> > > > > > > >   threads.
> > > > > > > > - added one new patch (current patch 8) to introduce migration 
> > > > > > > > mgmt
> > > > > > > >   lock for migrate_incoming.
> > > > > > > > 
> > > > > > > > This is an extended work for migration postcopy recovery. This 
> > > > > > > > series
> > > > > > > > is tested with the following series to make sure it solves the 
> > > > > > > > monitor
> > > > > > > > hang problem that we have encountered for postcopy recovery:
> > > > > > > > 
> > > > > > > >   [RFC 00/29] Migration: postcopy failure recovery
> > > > > > > >   [RFC 0/6] migration: re-use migrate_incoming for postcopy 
> > > > > > > > recovery
> > > > > > > > 
> > > > > > > > The root problem is that, monitor commands are all handled in 
> > > > > > > > main
> > > > > > > > loop thread now, no matter how many monitors we specify. And, 
> > > > > > > > if main
> > > > > > > > loop thread hangs due to some reason, all monitors will be 
> > > > > > > > stuck.
> > > > > > > > This can be done in reversed order as well: if any of the 
> > > > > > > > monitor
> > > > > > > > hangs, it will hang the main loop, and the rest of the monitors 
> > > > > > > > (if
> > > > > > > > there is any).
> > > > > > > > 
> > > > > > > > That affects postcopy recovery, since the recovery requires 
> > > > > > > > user input
> > > > > > > > on destination side.  If monitors hang, the destination VM dies 
> > > > > > > > and
> > > > > > > > lose hope for even a final recovery.
> > > > > > > > 
> > > > > > > > So, sometimes we need to make sure the monitor be alive, at 
> > > > > > > > least one
> > > > > > > > of them.
> > > > > > > > 
> > > > > > > > The whole idea of this series is that instead if handling 
> > > > > > > > monitor
> > > > > > > > commands all in main loop thread, we do it separately in 
> > > > > > > > per-monitor
> > > > > > > > threads.  Then, even if main loop thread hangs at any point by 
> > > > > > > > any
> > > > > > > > reason, per-monitor thread can still survive.  Further, we add 
> > > > > > > > hint in
> > > > > > > > QMP/HMP to show whether a command can be executed without QMP, 
> > > > > > > > if so,
> > > > > > > > we avoid taking BQL when running that command.  It greatly 
> > > > > > > > reduced
> > > > > > > > contention of BQL.  Now the only user of that new parameter 
> > > > > > > > (currently
> > > > > > > > I call it "without-bql") is "migrate-incoming" command, which 
> > > > > > > > is the
> > > > > > > > only command to rescue a paused postcopy migration.
> > > > > > > > 
> > > > > > > > However, even with the series, it does not mean that per-monitor
> > > > > > > > threads will never hang.  One example is that we can still run 
> > > > > > > > "info
> > > > > > > > vcpus" in per-monitor threads during a paused postcopy (in that 
> > > > > > > > state,
> > > > > > > > page faults are never handled, and "info cpus" will never 
> > > > > > > > return since
> > > > > > > > it tries to sync every vcpus).  So to make sure it does not 
> > > > > > > > hang, we
> > > > > > > > not only need the per-monitor thread, the user should be 
> > > > > > > > careful as
> > > > > > > > well on how to use it.
> > > > > > > > 
> > > > > > > > For postcopy recovery, we may need dedicated monitor channel for
> > > > > > > > recovery.  In other words, a destination VM that supports 
> > > > > > > > postcopy
> > > > > > > > recovery would possibly need:
> > > > > > > > 
> > > > > > > >   -qmp MAIN_CHANNEL -qmp RECOVERY_CHANNEL
> > > > > > > 
> > > > > > > I think this is a really horrible thing to expose to management 
> > > > > > > applications.
> > > > > > > They should not need to be aware of fact that QEMU is buggy and 
> > > > > > > thus requires
> > > > > > > that certain commands be run on different monitors to work around 
> > > > > > > the bug.
> > > > > > 
> > > > > > It's unfortunately baked in way too deep to fix in the near term; 
> > > > > > the
> > > > > > BQL is just too cantagious and we have a fundamental design of 
> > > > > > running
> > > > > > all the main IO emulation in one thread.
> > > > > > 
> > > > > > > I'd much prefer to see the problem described handled 
> > > > > > > transparently inside
> > > > > > > QEMU. One approach is have a dedicated thread in QEMU responsible 
> > > > > > > for all
> > > > > > > monitor I/O. This thread should never actually execute monitor 
> > > > > > > commands
> > > > > > > though, it would simply parse the command request and put data 
> > > > > > > onto a queue
> > > > > > > of pending commands, thus it could never hang. The command queue 
> > > > > > > could be
> > > > > > > processed by the main thread, or by another thread that is 
> > > > > > > interested.
> > > > > > > eg the migration thread could process any queued commands related 
> > > > > > > to
> > > > > > > migration directly.
> > > > > > 
> > > > > > That requires a change in the current API to allow async command
> > > > > > completion (OK that is something Marc-Andre's world has) so that
> > > > > > from the one connection you can have multiple outstanding commands.
> > > > > > Hmm unless....
> > > > > > 
> > > > > > We've also got problems that some commands don't like being run 
> > > > > > outside
> > > > > > of the main thread (see Fam's reply on the 21st pointing out that a 
> > > > > > lot
> > > > > > of block commands would assert).
> > > > > > 
> > > > > > I think the way to move to what you describe would be:
> > > > > >   a) A separate thread for monitor IO
> > > > > >       This seems a separate problem
> > > > > >       How hard is that?  Will all the current IO mechanisms used
> > > > > >       for monitors just work if we run them in a separate thread?
> > > > > >       What about mux?
> > > > > > 
> > > > > >   b) Initially all commands get dispatched to the main thread
> > > > > >      so nothing changes about the API.
> > > > > > 
> > > > > >   c) We create a new thread for the lock-free commands, and route
> > > > > >       lock-free commands down it.
> > > > > > 
> > > > > >   d) We start with a rule that on any one monitor connection we
> > > > > >   don't allow you to start a command until the previous one has
> > > > > >   finished
> > > > > > 
> > > > > > (d) allows us to avoid any API changes, but allows us to do 
> > > > > > lock-free
> > > > > > stuff on a separate connection like Peter's world.
> > > > > > We can drop (d) once we have a way of doing async commands.
> > > > > > We can add dispatching to more threads once someone describes
> > > > > > what they want from those threads.
> > > > > > 
> > > > > > Does that work for you Dan?
> > > > > 
> > > > > It would *provided* that we do (c) for the commands Peter wants for
> > > > > this migration series.  IOW, I don't want to have to have logic in
> > > > > libvirt that either needs to add a 2nd monitor server, or open a 2nd
> > > > > monitor connection, to deal with migration post-copy recovery in some
> > > > > versions of QEMU.  So whatever is needed to make post-copy recovery
> > > > > work has to be done for (c).
> > > > 
> > > > But then doesn't that mean you're requiring us to break (d) and change
> > > > the QMP interface to libvirt so it can do async stuff?
> > > 
> > > Depends on your definition of break - I'm assuming there's either a way
> > > to opt-in to use of a async mode for existing commands in (c), or that
> > > async commands would be added in parallel with existing sync commands.
> > > IOW, its not a API breakage - its an opt-in extension of existing
> > > functionality.
> > 
> > But you'd need to do async commands for all commands you issued to avoid
> > blocking the io thread so that you could then issue the recovery
> > commands.
> 
> I don't see why that has to be the case. In order to issue an async command
> all that needs to be the case is that command replies should be allowed to
> be sent out of order.
> 
> IOW if command A is blocking and command B is async, then we shoudl be
> allowed to have the following
> 
>    req A
>    req B
>    res A
>    res B
> 
> Or
> 
>    req A
>    req B
>    res B
>    res A
> 
> Or
> 
>    req B
>    req A
>    res B
>    res A
> 
> etc.
> 
> This does imply that you need a separate monitor I/O processing, from the
> command execution thread, but I see no need for all commands to suddenly
> become async. Just allowing interleaved replies is sufficient from the
> POV of the protocol definition. This interleaving is easy to handle from
> the client POV - just requires a unique 'serial' in the request by the
> client, that is copied into the reply by QEMU.


OK, so for that we can just take Marc-André's syntax and call it 'id':
  https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg03634.html

then it's upto the caller to ensure those id's are unique.

I do worry about two things:
  a) With this the caller doesn't really know which commands could be
  in parallel - for example if we've got a recovery command that's
  executed by this non-locking thread that's OK, we expect that
  to be doable in parallel.  If in the future though we do
  what you initially suggested and have a bunch of commands get
  routed to the migration thread (say) then those would suddenly
  operate in parallel with other commands that we're previously
  synchronous.

  b) I still worry how the various IO channels will behave on another
  thread.  But that's more a general feeling rather than anything
  specific.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [RFC v2 0/8] monitor: allow per-monitor thread

Reply via email to