Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command

Luiz Capitulino Fri, 06 Jan 2012 11:05:02 -0800

On Thu, 05 Jan 2012 15:41:33 -0600
Michael Roth <mdr...@linux.vnet.ibm.com> wrote:


> On 01/05/2012 02:25 PM, Luiz Capitulino wrote:
> > On Thu, 05 Jan 2012 09:10:50 -0600
> > Michael Roth<mdr...@linux.vnet.ibm.com>  wrote:
> >
> >> On 01/05/2012 08:42 AM, Luiz Capitulino wrote:
> >>> On Thu, 5 Jan 2012 12:59:27 +0000
> >>> "Daniel P. Berrange"<berra...@redhat.com>   wrote:
> >>>
> >>>> On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
> >>>>> On Thu, 5 Jan 2012 10:16:30 +0000
> >>>>> "Daniel P. Berrange"<berra...@redhat.com>   wrote:
> >>>>>
> >>>>>> On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
> >>>>>>> This version drops modes 'sleep' and 'hybrid' because they don't work
> >>>>>>> properly due to issues in qemu. Only the 'hibernate' mode is supported
> >>>>>>> for now.
> >>>>>>
> >>>>>> IMHO this is short-sighted. When the bugs QEMU in are fixed so that
> >>>>>> these modes work, you have needlessly put users in the situation where
> >>>>>> they have to now upgrade the guest agent everywhere to take advantage
> >>>>>> of the bugfix.
> >>>>>
> >>>>> That was my thinking until v4. But after discussing with Michael the 
> >>>>> issues
> >>>>> we have with S3 I concluded that it doesn't make sense to offer an API 
> >>>>> to
> >>>>> something that doesn't work, this will just generate bug reports. Also,
> >>>>> updating to get new features is normal and expected.
> >>>>
> >>>> This is assuming that users will always upgrade their VMs&   hosts in
> >>>> lock step, which I rather doubt they will in practice. eg imagine a
> >>>> deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
> >>>> and QEMU 1.2 (working S3). If they build VM disk images they will likely
> >>>> use the QEMU GA from 1.2 for all their builds, even if many of them
> >>>> will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
> >>>> 'hybrid' commands available in the guest agent, even though the host
> >>>> QEMU doesn't work properly.
> >>>>
> >>>> So you *will* ultimately need to make sure that QEMU GA from 1.2, has
> >>>> sensible behaviour when run on a QEMU 1.1 host.  If you don't address
> >>>> this during 1.1, you may well find yourself in an un-winnable situation
> >>>> for 1.2, where it is impossible to provide good behaviour on old hosts.
> >>>>
> >>>> So IMHO we are better off in the long run, if we include all commands
> >>>> right now, even though some don't work yet, and work to ensure we have
> >>>> good error reporting behaviour for those that don't work.
> >>>
> >>> Yes, I agree. As a side note: if we add error reporting it will only work
> >>> on 1.1 and later.  Ie, the problem you describe above will still happen
> >>> with 1.0.
> >>>
> >>> But what you're suggesting seems to be the right thing to do. Do you agree
> >>> Michael?
> >>
> >> Agree, but unless we add an RPC that QEMU uses to advertise
> >> capabilities, I'm really not sure it's possible to detect whether or not
> >> the host will support it.
> >
> > You mean an RPC to advertise if 'sleep' is supported? I think this is best 
> > done
> > by making guest-suspend return an error as suggested by Daniel, otherwise a
> > client that doesn't query for capabilities might run in trouble.
> 
> Agreed, but what I mean is that if the user executes the suspend using 
> on up-level agent running on a down-level 1.0 host, the agent will still 
> see s3 advertised and issue the buggy suspend. That's why I suggested 
> the host->agent capabilities reporting as a possible (but somewhat ugly) 
> way to just simply tell the agent it can handle it (and, lacking that, 
> assume that it can't).

That makes sense.

> 
> >
> > There's an important detail though: we need to make qemu not advertise S3 
> > for
> > this to work. However, we might be able to fix S3 for 1.1 (and bugs, like 
> > the
> > S4 ones, can't be detected, limiting the scope of the 'unsupported' error).
> >
> > So, we could merge all modes and commit to get S3 fixed for 1.1 :)
> 
> No disagreement there, if we can commit to making qemu-ga/qemu 1.1 
> releases interoperable in this manner (whether by fixing s3 or not 
> advertising it), I think that approach is perfectly fine, ideal even. 
> Doing a 1.1 release where qemu and qemu-ga are not interoperable (qemu 
> missing s3 support, qemu-ga using s3) was my main objection.

I see.

> But there is a 2nd topic here I'm trying to mull over: what is qemu-ga's 
> support policy for down-level hosts? backward-compatible? incompatible?

That's a good question, I think we should be backward-compatible, but I think
that's not going to be trivial.

> The above approach to this problem suggests the latter (qemu-ga 1.1 has 
> RPCs that will knowingly break 1.0 qemu instances). We could solve this 
> by introducing the capabilities negotiation I mentioned early. It 
> actually wouldn't need to be anything other than qemu telling qemu-ga 
> what qemu-ga version-level it supports. By default we assume 1.0, and 
> limit qemu-ga to that until qemu-ga is told otherwise (so, no 
> sleep/hybrid suspend modes). For new RPCs we may be able to handle this 
> version automatically, since we include qemu version levels for the RPCs 
> in the schema. For functionality within an RPC (like sleep/hybrid 
> suspend modes) we could use conditional code.
> 
> If we take that approach (maintaining backward-compatibility), we'd need 
> to introduce that code in the agent now, and require qemu/libvirt 
> execute the guest-set-support-level RPC or whatever to access these 1.1 
> features.

What does guest-set-support-level do? It enables all 1.1 post features?

A different approach would be to add a new field in the command dict in
the schema file, say 'broken-in-qemu-version', and change qemu-ga to check
that field in its main loop before executing a command. If
'broken-in-qemu-version' <= qemu version qemu-ga returns an not supported
error.

For commands like the guest-suspend which is partially supported, we'd have
to do a manual check for the qemu version as you suggested above.

That's just an idea though, I'm not sure what's the best way to do this.

> 
> Technically, there's a required RPC qemu-ga clients need to execute 
> already: guest-sync. It's required because we have no way to reliably 
> detect EOF over virtio-serial, and thus an agent may send stale data to 
> a newly-connected qemu-ga client, so the client needs to do the 
> guest-sync command to find the expected response and re-sync the 
> streams. We could roll the guest-set-support-level functionality into 
> that. Basically just add another field.
> 
> >
> >> And if we can't detect that reliably, we're
> >> better off leaving it out for now, because sleeping guests is not
> >> obscure functionality, and accidentally nuking guests when a user sleeps
> >> them (presumably because they want to retain their working state) is
> >> much worse than telling a user to upgrade their agent, or not supported
> >> or whatever.
> >>
> >>>
> >>>> As an example, if S3 is broken in current QEMU, then we should not be
> >>>> advertizing S3 to the guest OS. This would allow 'pm-is-supported 
> >>>> --suspend'
> >>>> to return false, at which point the guest agent can send back a nice 
> >>>> error
> >>>> message 'Suspend is not supported on this host', instead of just having 
> >>>> the
> >>>> guest try to suspend&   hang or worse.
> >>>
> >>
> >
>

Re: [Qemu-devel] [PATCH v4 0/2]: qemu-ga: Add the guest-suspend command

Reply via email to