Hi On Fri, Aug 25, 2017 at 6:12 PM Dr. David Alan Gilbert <dgilb...@redhat.com> wrote:
> * Marc-André Lureau (marcandre.lur...@gmail.com) wrote: > > On Fri, Aug 25, 2017 at 5:33 PM Dr. David Alan Gilbert < > dgilb...@redhat.com> > > wrote: > > > > > * Marc-André Lureau (marcandre.lur...@gmail.com) wrote: > > > > Hi > > > > > > > > On Wed, Aug 23, 2017 at 8:52 AM Peter Xu <pet...@redhat.com> wrote: > > > > > > > > > Firstly, introduce Monitor.use_thread, and set it for monitors > that are > > > > > using non-mux typed backend chardev. We only do this for > monitors, so > > > > > mux-typed chardevs are not suitable (when it connects to, e.g., > serials > > > > > and the monitor together). > > > > > > > > > > When use_thread is set, we create standalone thread to poll the > monitor > > > > > events, isolated from the main loop thread. Here we still need to > take > > > > > the BQL before dispatching the tasks since some of the monitor > commands > > > > > are not allowed to execute without the protection of BQL. Then > this > > > > > gives us the chance to avoid taking the BQL for some monitor > commands > > > in > > > > > the future. > > > > > > > > > > * Why this change? > > > > > > > > > > We need these per-monitor threads to make sure we can have at > least one > > > > > monitor that will never stuck (that can receive further monitor > > > > > commands). > > > > > > > > > > * So when will monitors stuck? And, how do they stuck? > > > > > > > > > > After we have postcopy and remote page faults, it's simple to > achieve a > > > > > stuck in the monitor (which is also a stuck in main loop thread): > > > > > > > > > > (1) Monitor deadlock on BQL > > > > > > > > > > As we may know, when postcopy is running on destination VM, the > vcpu > > > > > threads can stuck merely any time as long as it tries to access an > > > > > uncopied guest page. Meanwhile, when the stuck happens, it is > possible > > > > > that the vcpu thread is holding the BQL. If the page fault is not > > > > > handled quickly, you'll find that monitors stop working, which is > > > trying > > > > > to take the BQL. > > > > > > > > > > If the page fault cannot be handled correctly (one case is a paused > > > > > postcopy, when network is temporarily down), monitors will hang > > > > > forever. Without current patch, that means the main loop hanged. > > > We'll > > > > > never find a way to talk to VM again. > > > > > > > > > > > > > Could the BQL be pushed down to the monitor commands level instead? > That > > > > way we wouldn't need a seperate thread to solve the hang on commands > that > > > > do not need BQL. > > > > > > If the main thread is stuck though I don't see how that helps you; you > > > have to be able to run these commands on another thread. > > > > > > > Why would the main thread be stuck? In (1) If the vcpu thread takes the > BQL > > and the command doesn't need it, it would work. > > True; assuming nothing else in the main loop is blocked; which is a big > if - making sure no bh's etc could block on guest memory or the bql. > > > In (2), info cpus > > shouldn't keep the BQL (my qapi-async series would probably help here) > > How does that work? > > https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg03626.html With the series, a command can be broken up in receive/start & finish/reply. This allows to reenter the loop, potentially freeing the BQL, and process other events. This allowed me to fix a screendump glitch bug ( http://lists.gnu.org/archive/html/qemu-devel/2017-01/msg03650.html). This also open the door to concurrent QMP commands (if the client turns on the capability option). -- Marc-André Lureau