Hi On Mon, Aug 28, 2017 at 1:08 PM Markus Armbruster <arm...@redhat.com> wrote:
> Marc-André Lureau <marcandre.lur...@gmail.com> writes: > > > On Fri, Aug 25, 2017 at 5:33 PM Dr. David Alan Gilbert < > dgilb...@redhat.com> > > wrote: > > > >> * Marc-André Lureau (marcandre.lur...@gmail.com) wrote: > >> > Hi > >> > > >> > On Wed, Aug 23, 2017 at 8:52 AM Peter Xu <pet...@redhat.com> wrote: > >> > > >> > > Firstly, introduce Monitor.use_thread, and set it for monitors that > are > >> > > using non-mux typed backend chardev. We only do this for monitors, > so > >> > > mux-typed chardevs are not suitable (when it connects to, e.g., > serials > >> > > and the monitor together). > >> > > > >> > > When use_thread is set, we create standalone thread to poll the > monitor > >> > > events, isolated from the main loop thread. Here we still need to > take > >> > > the BQL before dispatching the tasks since some of the monitor > commands > >> > > are not allowed to execute without the protection of BQL. Then this > >> > > gives us the chance to avoid taking the BQL for some monitor > commands in > >> > > the future. > >> > > > >> > > * Why this change? > >> > > > >> > > We need these per-monitor threads to make sure we can have at least > one > >> > > monitor that will never stuck (that can receive further monitor > >> > > commands). > >> > > > >> > > * So when will monitors stuck? And, how do they stuck? > >> > > > >> > > After we have postcopy and remote page faults, it's simple to > achieve a > >> > > stuck in the monitor (which is also a stuck in main loop thread): > >> > > > >> > > (1) Monitor deadlock on BQL > >> > > > >> > > As we may know, when postcopy is running on destination VM, the vcpu > >> > > threads can stuck merely any time as long as it tries to access an > >> > > uncopied guest page. Meanwhile, when the stuck happens, it is > possible > >> > > that the vcpu thread is holding the BQL. If the page fault is not > >> > > handled quickly, you'll find that monitors stop working, which is > trying > >> > > to take the BQL. > >> > > > >> > > If the page fault cannot be handled correctly (one case is a paused > >> > > postcopy, when network is temporarily down), monitors will hang > >> > > forever. Without current patch, that means the main loop hanged. > We'll > >> > > never find a way to talk to VM again. > >> > > > >> > > >> > Could the BQL be pushed down to the monitor commands level instead? > That > >> > way we wouldn't need a seperate thread to solve the hang on commands > that > >> > do not need BQL. > >> > >> If the main thread is stuck though I don't see how that helps you; you > >> have to be able to run these commands on another thread. > >> > > > > Why would the main thread be stuck? In (1) If the vcpu thread takes the > BQL > > and the command doesn't need it, it would work. In (2), info cpus > > shouldn't keep the BQL (my qapi-async series would probably help here) > > This has been discussed several times[*], but of course not with > everybody, so I'll summarize once more: asynchronous commands are not a > actually *required* for anything. They are *one* way to package the > "kick off task, receive an asynchronous message when it's done" pattern. > Another way is synchronous command for the kick off, event for the > "done". > But you would have to break or duplicate the QMP APIs. My proposal doesn't need that, a command can reenter the main loop, and keep current QMP API. > For better or worse, synchronous command + event is what we have today. > Whether adding another way to package the the same thing improves the > QMP interface is doubtful. > I would argue my series is mostly about internal refactoring for the benefit mentionned above. The fact that you can do (optionnaly) concurrent QMP commands is a nice bonus. Furthermore, it simplifies the API compared to CMD / dummy reply + EVENT. And it gives a meaning to the exisiting command "id".. > > [*] Try this one: > Message-ID: <87o9yv890z....@dusky.pond.sub.org> > https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg05483.html > -- Marc-André Lureau