Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll

Marc-André Lureau Fri, 25 Aug 2017 09:22:31 -0700

Hi

On Fri, Aug 25, 2017 at 6:12 PM Dr. David Alan Gilbert <dgilb...@redhat.com>
wrote:


> * Marc-André Lureau (marcandre.lur...@gmail.com) wrote:
> > On Fri, Aug 25, 2017 at 5:33 PM Dr. David Alan Gilbert <
> dgilb...@redhat.com>
> > wrote:
> >
> > > * Marc-André Lureau (marcandre.lur...@gmail.com) wrote:
> > > > Hi
> > > >
> > > > On Wed, Aug 23, 2017 at 8:52 AM Peter Xu <pet...@redhat.com> wrote:
> > > >
> > > > > Firstly, introduce Monitor.use_thread, and set it for monitors
> that are
> > > > > using non-mux typed backend chardev.  We only do this for
> monitors, so
> > > > > mux-typed chardevs are not suitable (when it connects to, e.g.,
> serials
> > > > > and the monitor together).
> > > > >
> > > > > When use_thread is set, we create standalone thread to poll the
> monitor
> > > > > events, isolated from the main loop thread.  Here we still need to
> take
> > > > > the BQL before dispatching the tasks since some of the monitor
> commands
> > > > > are not allowed to execute without the protection of BQL.  Then
> this
> > > > > gives us the chance to avoid taking the BQL for some monitor
> commands
> > > in
> > > > > the future.
> > > > >
> > > > > * Why this change?
> > > > >
> > > > > We need these per-monitor threads to make sure we can have at
> least one
> > > > > monitor that will never stuck (that can receive further monitor
> > > > > commands).
> > > > >
> > > > > * So when will monitors stuck?  And, how do they stuck?
> > > > >
> > > > > After we have postcopy and remote page faults, it's simple to
> achieve a
> > > > > stuck in the monitor (which is also a stuck in main loop thread):
> > > > >
> > > > > (1) Monitor deadlock on BQL
> > > > >
> > > > > As we may know, when postcopy is running on destination VM, the
> vcpu
> > > > > threads can stuck merely any time as long as it tries to access an
> > > > > uncopied guest page.  Meanwhile, when the stuck happens, it is
> possible
> > > > > that the vcpu thread is holding the BQL.  If the page fault is not
> > > > > handled quickly, you'll find that monitors stop working, which is
> > > trying
> > > > > to take the BQL.
> > > > >
> > > > > If the page fault cannot be handled correctly (one case is a paused
> > > > > postcopy, when network is temporarily down), monitors will hang
> > > > > forever.  Without current patch, that means the main loop hanged.
> > > We'll
> > > > > never find a way to talk to VM again.
> > > > >
> > > >
> > > > Could the BQL be pushed down to the monitor commands level instead?
> That
> > > > way we wouldn't need a seperate thread to solve the hang on commands
> that
> > > > do not need BQL.
> > >
> > > If the main thread is stuck though I don't see how that helps you; you
> > > have to be able to run these commands on another thread.
> > >
> >
> > Why would the main thread be stuck? In (1) If the vcpu thread takes the
> BQL
> > and the command doesn't need it, it would work.
>
> True; assuming nothing else in the main loop is blocked;  which is a big
> if - making sure no bh's etc could block on guest memory or the bql.
>
>
> In (2),  info cpus
> > shouldn't keep the BQL (my qapi-async series would probably help here)
>
> How does that work?
>
>
https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg03626.html

With the series, a command can be broken up in receive/start &
finish/reply. This allows to reenter the loop, potentially freeing the BQL,
and process other events. This allowed me to fix a screendump glitch bug (
http://lists.gnu.org/archive/html/qemu-devel/2017-01/msg03650.html). This
also open the door to concurrent QMP commands (if the client turns on the
capability option).

-- 
Marc-André Lureau

Re: [Qemu-devel] [RFC v2 2/8] monitor: allow monitor to create thread to poll

Reply via email to