"Dr. David Alan Gilbert" <dgilb...@redhat.com> writes: > * Daniel P. Berrange (berra...@redhat.com) wrote: >> On Thu, Sep 07, 2017 at 04:13:41PM +0800, Peter Xu wrote: >> > On Wed, Sep 06, 2017 at 12:54:28PM +0100, Daniel P. Berrange wrote: >> > > On Wed, Sep 06, 2017 at 12:31:58PM +0100, Dr. David Alan Gilbert wrote: >> > > > * Daniel P. Berrange (berra...@redhat.com) wrote: >> > > > > This does imply that you need a separate monitor I/O processing, >> > > > > from the >> > > > > command execution thread, but I see no need for all commands to >> > > > > suddenly >> > > > > become async. Just allowing interleaved replies is sufficient from >> > > > > the >> > > > > POV of the protocol definition. This interleaving is easy to handle >> > > > > from >> > > > > the client POV - just requires a unique 'serial' in the request by >> > > > > the >> > > > > client, that is copied into the reply by QEMU. >> > > > >> > > > OK, so for that we can just take Marc-André's syntax and call it 'id': >> > > > https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg03634.html >> > > > >> > > > then it's upto the caller to ensure those id's are unique. >> > > >> > > Libvirt has in fact generated a unique 'id' for every monitor command >> > > since day 1 of supporting QMP. >> > > >> > > > I do worry about two things: >> > > > a) With this the caller doesn't really know which commands could be >> > > > in parallel - for example if we've got a recovery command that's >> > > > executed by this non-locking thread that's OK, we expect that >> > > > to be doable in parallel. If in the future though we do >> > > > what you initially suggested and have a bunch of commands get >> > > > routed to the migration thread (say) then those would suddenly >> > > > operate in parallel with other commands that we're previously >> > > > synchronous. >> > > >> > > We could still have an opt-in for async commands. eg default to executing >> > > all commands in the main thread, unless the client issues an explicit >> > > "make it async" command, to switch to allowing the migration thread to >> > > process it async. >> > > >> > > { "execute": "qmp_allow_async", >> > > "data": { "commands": [ >> > > "migrate_cancel", >> > > ] } } >> > > >> > > >> > > { "return": { "commands": [ >> > > "migrate_cancel", >> > > ] } } >> > > >> > > The server response contains the subset of commands from the request >> > > for which async is supported. >> > > >> > > That gives good negotiation ability going forward as we incrementally >> > > support async on more commands. >> > >> > I think this goes back to the discussion on which design we'd like to >> > choose. IMHO the whole async idea plus the per-command-id is indeed >> > cleaner and nicer, and I believe that can benefit not only libvirt, >> > but also other QMP users. The problem is, I have no idea how long >> > it'll take to let us have such a feature - I believe that will include >> > QEMU and Libvirt to both support that. And it'll be a pity if the >> > postcopy recovery cannot work only because we cannot guarantee a >> > stable monitor. >> >> This is not a blocker for having postcopy recovery feature merged. >> It merely means that in a situation where the mainloop is blocked, >> then we can't recover, in other situations we'll be able to recover >> fine. Sure it would be nice to fix that problem too, but I don't >> see it as a block. > > It's probably OK to merge the recovery code before the monitor code; > but I don't think it's something you'd want to tell users about - > a 'postcopy recovery that only works rarely' isn't much use.
"Rarely"? Are main loop hangs *that* common? Can we quantify the problem to help gauge urgency?