On 01/05/2012 06:59 AM, Daniel P. Berrange wrote:
On Thu, Jan 05, 2012 at 10:37:14AM -0200, Luiz Capitulino wrote:
On Thu, 5 Jan 2012 10:16:30 +0000
"Daniel P. Berrange"<berra...@redhat.com> wrote:
On Wed, Jan 04, 2012 at 05:45:11PM -0200, Luiz Capitulino wrote:
This version drops modes 'sleep' and 'hybrid' because they don't work
properly due to issues in qemu. Only the 'hibernate' mode is supported
for now.
IMHO this is short-sighted. When the bugs QEMU in are fixed so that
these modes work, you have needlessly put users in the situation where
they have to now upgrade the guest agent everywhere to take advantage
of the bugfix.
That was my thinking until v4. But after discussing with Michael the issues
we have with S3 I concluded that it doesn't make sense to offer an API to
something that doesn't work, this will just generate bug reports. Also,
updating to get new features is normal and expected.
This is assuming that users will always upgrade their VMs& hosts in
lock step, which I rather doubt they will in practice. eg imagine a
deployment might have a mixture of hosts, running QEMU 1.1 (broken S3)
and QEMU 1.2 (working S3). If they build VM disk images they will likely
use the QEMU GA from 1.2 for all their builds, even if many of them
will only run on QEMU 1.1 hosts. So you'll end up having 'sleep' and
'hybrid' commands available in the guest agent, even though the host
QEMU doesn't work properly.
So you *will* ultimately need to make sure that QEMU GA from 1.2, has
sensible behaviour when run on a QEMU 1.1 host. If you don't address
this during 1.1, you may well find yourself in an un-winnable situation
for 1.2, where it is impossible to provide good behaviour on old hosts.
So IMHO we are better off in the long run, if we include all commands
right now, even though some don't work yet, and work to ensure we have
good error reporting behaviour for those that don't work.
As an example, if S3 is broken in current QEMU, then we should not be
advertizing S3 to the guest OS. This would allow 'pm-is-supported --suspend'
to return false, at which point the guest agent can send back a nice error
message 'Suspend is not supported on this host', instead of just having the
guest try to suspend& hang or worse.
This still requires we're lockstep with host QEMU (ideally that would be
the case via push-deployment of the agent, exactly because of issues
like this. Or at least, it'd make the upgrade process painless). And
outside of that, I really cannot think of any nice way to check, from
the agent, that a host has required functionality for {this,an} RPC. Not
unless we forced a bi-directional capabilities negotiation sequence, and
I don't like the idea of injecting this kind of data into a guest.
libvirt could maybe filter the modes based on QEMU version, but that's
not the only consumer of the agent.
Really I think this is a case study for why push-deployment of agents is
the way to go. QEMU could query qemu-ga directly and generate an 'agent
update available' event that users/frontends can use to prompt an update
to the latest version. Then all the upgrade inertia involved with saving
code/features for subsequent agent versions is mostly gone, and we can
"do the right thing" with regard to broken functionality :)
Unfortunately that option isn't available yet. But it just seems wrong
to introduce something we know is broken, to the extent that even those
involved with it's development aren't currently capable of testing it
fully. I mean, we know what the user expectations are, and it's not
that, unfortunately for us :( I'd be more open to it if the bug wasn't
so bad, but nuking your guest's working state every time you make the
mistake of hitting the pretty "sleep" button in virt-manager or whatever
is pretty bad.
Daniel