On 02/23/2011 10:14 AM, Avi Kivity wrote:
On 02/23/2011 06:01 PM, Anthony Liguori wrote:
Qemu does keep state. Currently only images, but in theory also the
on-board NVRAM.
Yeah, this is a good example of an area where a "stateful config
file" would be useful. I like the idea of storing this sort of thing
in a text file with a config structure because a user certainly wants
to be able to specify the boot order. Being able to tweak this kind
of stuff adds a lot of interesting capabilities.
My preference would be a binary file (a disk image, in fact), with a
tool to play with the known fields. It allows a management tool to
reuse its storage stack.
I'm not sure yet what I think is best. To be honest, I was planning on
tackling this for 0.16 and focusing on the QMP side of the world for
0.15 so I don't have as well formed opinions yet.
Well specifically, it has to ask QEMU and QEMU can tell it the
current state via a nice structured data format over QMP. It's a
hell of a lot easier than the management tool trying to do this
outside of QEMU.
So, if qemu crashes, the management tool has to start it up to find
out what the current state is.
Depends on how opaque we make the state file. I've been thinking a
simple ini syntax with a well supported set of keys. In that case, a
management tool can read it without starting QEMU.
But at which points should it parse it?
I was thinking that we should post events whenever we change the
stateful config. That would let the management tool have a mechanism
for determining when settings have been changed. Of course, if the
management tool crashes, it should re-read at startup.
I don't think it's reasonable to have three different ways to
interact with qemu, all needed: the command line, reading and
writing the stateful config file, and the monitor. I'd rather push
for starting qemu with a blank guest and assembling (cold-plugging)
all the hardware via the monitor before starting the guest.
Yes. I view the command line as optional. To me, this is the ideal
interaction:
1) start qemu with an empty stateful config file
2) issue monitor commands to create all devices and backends
3) the stateful config file totally captures the state of all of the
issued QMP commands. The management tool can relaunch the guest just
by passing the stateful config file to QEMU.
4) when the management tool needs to "extract" a config file, it can
read the stateful config (through the monitor) and generate it's own
config.
5) the management tool should treat the stateful config file as more
or less opaque. It shouldn't be visible to end user.
In the non-managed case, users should interact directly with the
config file.
Doesn't the stateful non-config file becomes a failure point? It has
to be on shared and redundant storage?
It depends on what your availability model is and how frequently your
management tool backs up the config. As of right now, we have a pretty
glaring reliability hole here so adding a stateful "non-config" can only
improve things.
To me, it seems a lot easier to require management to replay any
commands that hadn't been acknowledged (due to management failure), or
to query qemu as to its current state (if it is alive).
You still have the race condition around guest initiated events like
eject. Unless you have an acknowledged event from a management tool
(which we can't do in QMP today) whereas you don't complete the guest
initiated eject operation until management ack's it, we need to store
that state ourself.
I don't like the idea of making a management tool such an integral part
of the functional paths. Not having a stateful config file also means
that this problem isn't solved in any form without a really
sophisticated management stack. I'm a big fan of being robust in the
face of not-so sophisticated management tools.
Management already needs stable, redundant config storage anyway
(often a database).
Can you give more examples?
I think I demonstrated that hot-plug can be solved via the existing
interfaces.
Sure. CMOS settings right now are not persisted across reboot.
That is handled easily with an NVRAM disk image.
Guest initiated activities like IDE or PCI eject are tricky to
persist correctly within a management tool.
We could add events for all of this things but it's all racy since
events are posted. If we have a stateful config file, we can make
all of these things non-racy and post an event that the config has
changed. If there's a crash, the management tool can read the config
on startup to catch up on missed events.
If qemu crashes, these events are meaningless. If management crashes,
it has to query qemu for all state that it wants to keep track of via
events.
Think power failure, not qemu crash. In the event of a power failure,
any hardware change initiated by the guest ought to be consistent with
when the guest has restarted. If you eject the CDROM tray and then lose
power, its still ejected after the power comes back on.
I think the nature of a posted event management interface is such
that we need a stateful config that persists across QEMU invocations.
I'm not convinced, and I think making qemu manage even more state
creates more problems.
Well this patch series is making qemu management more state. The only
question is whether we do this as a one-off mechanism or whether we
architect a general mechanism to do it.
How much state we store can always be up for discussion but I think it's
undeniable that we need to store more state than we're storing today (none).
Regards,
Anthony Liguori