On 02/23/2011 09:31 AM, Avi Kivity wrote:
On 02/23/2011 04:35 PM, Anthony Liguori wrote:
On 02/23/2011 07:01 AM, Avi Kivity wrote:
On 02/23/2011 01:14 AM, Anthony Liguori wrote:
-drive already ties into the qemuopts infrastructure and we have
readconfig and writeconfig. I don't think we're missing any major
pieces to do this in a more proper fashion.
The problem with qemu config files is that it splits the
authoritative source of where images are stored into two. Is it in
the management tool's database or is it in qemu's config file?
I like to use the phrase "stateful config file". To me, it's just a
database for QEMU to persist data about the VM. It's the only way
for QEMU to make certain transactions atomic in the face of QEMU
crashing.
The user visible config file is a totally different concept. A
management tool launches QEMU and tells it where to keep it's state
database. The management application may prepopulate the state
database or it may just use an empty file.
In that case the word 'config' is misleading. To me, it implies that
the user configures something, and qemu reads it, not something mostly
internal to qemu.
Understood.
Qemu does keep state. Currently only images, but in theory also the
on-board NVRAM.
Yeah, this is a good example of an area where a "stateful config file"
would be useful. I like the idea of storing this sort of thing in a
text file with a config structure because a user certainly wants to be
able to specify the boot order. Being able to tweak this kind of stuff
adds a lot of interesting capabilities.
QEMU uses the state database to store information that is created
dynamically. For instance, devices added through device_add. A
device added via -device wouldn't necessary get added to the state
database.
Practically speaking, it let's you invoke QEMU with a fixed command
line, while still using the monitor to make changes that would
otherwise require the command line being updated.
Then the invoker quickly loses track of what the actual state is. It
can't just remember which commands it issued (presumably in response
to the user updating user visible state). It has to parse the
stateful config file qemu outputs.
Well specifically, it has to ask QEMU and QEMU can tell it the current
state via a nice structured data format over QMP. It's a hell of a lot
easier than the management tool trying to do this outside of QEMU.
But at which points should it parse it?
I was thinking that we should post events whenever we change the
stateful config. That would let the management tool have a mechanism
for determining when settings have been changed. Of course, if the
management tool crashes, it should re-read at startup.
I don't think it's reasonable to have three different ways to interact
with qemu, all needed: the command line, reading and writing the
stateful config file, and the monitor. I'd rather push for starting
qemu with a blank guest and assembling (cold-plugging) all the
hardware via the monitor before starting the guest.
Yes. I view the command line as optional. To me, this is the ideal
interaction:
1) start qemu with an empty stateful config file
2) issue monitor commands to create all devices and backends
3) the stateful config file totally captures the state of all of the
issued QMP commands. The management tool can relaunch the guest just by
passing the stateful config file to QEMU.
4) when the management tool needs to "extract" a config file, it can
read the stateful config (through the monitor) and generate it's own config.
5) the management tool should treat the stateful config file as more or
less opaque. It shouldn't be visible to end user.
In the non-managed case, users should interact directly with the config
file.
For the problem at hand, one solution is to make qemu stop after the
copy, and then management can issue an additional command to
rearrange the disk and resume the guest. A drawback here is that if
management dies, the guest is stopped until it restarts. We also
make management latency guest visible, even if it doesn't die at an
inconvenient place.
An alternative approach is to have the copy be performed by a new
layered block format driver:
- create a new image, type = live-copy, containing three pieces of
information
- source image
- destination image
- copy state (initially nothing is copied)
- tell qemu switch to the new image
- qemu starts copying, updates copy state as needed
- copy finishes, event is emitted; reads and writes still serviced
- management receives event, switches qemu to destination image
- management removes live-copy image
If management dies while this is happening, it can simply query the
state of the copy. Similarly, if qemu dies, the copy state is
persistent (could be 0/1 or real range of blocks).
This is a more elegant solution to the problem than the commit
problem but it's also a one-off. I think we have a generic problem
here and we ought to try to solve it generically (within reason).
Can you give more examples?
I think I demonstrated that hot-plug can be solved via the existing
interfaces.
Sure. CMOS settings right now are not persisted across reboot. Guest
initiated activities like IDE or PCI eject are tricky to persist
correctly within a management tool.
We could add events for all of this things but it's all racy since
events are posted. If we have a stateful config file, we can make all
of these things non-racy and post an event that the config has changed.
If there's a crash, the management tool can read the config on startup
to catch up on missed events.
I think the nature of a posted event management interface is such that
we need a stateful config that persists across QEMU invocations.
Regards,
Anthony Liguori