Hi

On Mon, Aug 28, 2017 at 1:08 PM Markus Armbruster <arm...@redhat.com> wrote:

> Marc-André Lureau <marcandre.lur...@gmail.com> writes:
>
> > On Fri, Aug 25, 2017 at 5:33 PM Dr. David Alan Gilbert <
> dgilb...@redhat.com>
> > wrote:
> >
> >> * Marc-André Lureau (marcandre.lur...@gmail.com) wrote:
> >> > Hi
> >> >
> >> > On Wed, Aug 23, 2017 at 8:52 AM Peter Xu <pet...@redhat.com> wrote:
> >> >
> >> > > Firstly, introduce Monitor.use_thread, and set it for monitors that
> are
> >> > > using non-mux typed backend chardev.  We only do this for monitors,
> so
> >> > > mux-typed chardevs are not suitable (when it connects to, e.g.,
> serials
> >> > > and the monitor together).
> >> > >
> >> > > When use_thread is set, we create standalone thread to poll the
> monitor
> >> > > events, isolated from the main loop thread.  Here we still need to
> take
> >> > > the BQL before dispatching the tasks since some of the monitor
> commands
> >> > > are not allowed to execute without the protection of BQL.  Then this
> >> > > gives us the chance to avoid taking the BQL for some monitor
> commands in
> >> > > the future.
> >> > >
> >> > > * Why this change?
> >> > >
> >> > > We need these per-monitor threads to make sure we can have at least
> one
> >> > > monitor that will never stuck (that can receive further monitor
> >> > > commands).
> >> > >
> >> > > * So when will monitors stuck?  And, how do they stuck?
> >> > >
> >> > > After we have postcopy and remote page faults, it's simple to
> achieve a
> >> > > stuck in the monitor (which is also a stuck in main loop thread):
> >> > >
> >> > > (1) Monitor deadlock on BQL
> >> > >
> >> > > As we may know, when postcopy is running on destination VM, the vcpu
> >> > > threads can stuck merely any time as long as it tries to access an
> >> > > uncopied guest page.  Meanwhile, when the stuck happens, it is
> possible
> >> > > that the vcpu thread is holding the BQL.  If the page fault is not
> >> > > handled quickly, you'll find that monitors stop working, which is
> trying
> >> > > to take the BQL.
> >> > >
> >> > > If the page fault cannot be handled correctly (one case is a paused
> >> > > postcopy, when network is temporarily down), monitors will hang
> >> > > forever.  Without current patch, that means the main loop hanged.
> We'll
> >> > > never find a way to talk to VM again.
> >> > >
> >> >
> >> > Could the BQL be pushed down to the monitor commands level instead?
> That
> >> > way we wouldn't need a seperate thread to solve the hang on commands
> that
> >> > do not need BQL.
> >>
> >> If the main thread is stuck though I don't see how that helps you; you
> >> have to be able to run these commands on another thread.
> >>
> >
> > Why would the main thread be stuck? In (1) If the vcpu thread takes the
> BQL
> > and the command doesn't need it, it would work.  In (2),  info cpus
> > shouldn't keep the BQL (my qapi-async series would probably help here)
>
> This has been discussed several times[*], but of course not with
> everybody, so I'll summarize once more: asynchronous commands are not a
> actually *required* for anything.  They are *one* way to package the
> "kick off task, receive an asynchronous message when it's done" pattern.
> Another way is synchronous command for the kick off, event for the
> "done".
>

But you would have to break or duplicate the QMP APIs. My proposal doesn't
need that, a command can reenter the main loop, and keep current QMP API.


> For better or worse, synchronous command + event is what we have today.
> Whether adding another way to package the the same thing improves the
> QMP interface is doubtful.
>

I would argue my series is mostly about internal refactoring for the
benefit mentionned above. The fact that you can do (optionnaly) concurrent
QMP commands is a nice bonus. Furthermore, it simplifies the API compared
to CMD / dummy reply + EVENT. And it gives a meaning to the exisiting
command "id"..


>
> [*] Try this one:
> Message-ID: <87o9yv890z....@dusky.pond.sub.org>
> https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg05483.html
>
-- 
Marc-André Lureau

Reply via email to