On 02/27/2011 07:41 PM, Anthony Liguori wrote:
I agree 100% the management tool cannot be the authoritative source
of state.
My position is:
- the management tool should be 100% in control of configuration (how
the guest is put together from its components)
- qemu should be 100% in control of state (memory, disk state, NVRAM
in various components, cd-rom eject state, explosive bolts for
payload separation, self-destruct mechanism, etc.)
There simply is not such a clean separation between the two because
things that the guest does affects the configuration of the guest.
Hot plug,
I don't think hotunplug works this way. When the guest ejects the pci
or usb device, it simply stops working with the device and disconnects
the power. There is nothing non-volatile going on, no spring-loaded
lever that pushes the device out. If the server reboots immediately
after hotunplug, but before the user physically removes the device, then
the server will see the device when it boots up.
removable media eject,
Here, we do have a single bit of non-volatile storage.
persistent device settings (whether it's CMOS or EEPROM) all disrupt
this model.
These are just arrays of bits, most of them with no standard
interpretation. So a block device fits them perfectly.
If you really wanted to have this separation, you'd have to be very
strict about making all guest settings not be specified in config.
You would need to do:
qemu-img create -f e1000-eprom -o macaddr=12:23:45:67:78:90 e1000.0.rom
qemu-img create -f e1000-eprom -o macaddr=12:23:45:67:78:91 e1000.1.rom
qemu -device e1000,id=e1000.0,eeprom=e1000.0.rom -device
e1000,id=e1000.1,eeprom=e1000.1.rom
And now I need a tool that lets me modify e1000-eprom images if I want
to change the mac address dynamically (say I'm trying to clone a VM).
This type of model can be workable but as I said earlier, I think it's
overengineering the problem.
In fact I don't think anyone wants this. Usually management wants the
assigned MAC to be used without the guest playing games with it. So
it's more or less pointless however it's implemented.
We don't separate configuration from guest state today. Instead of
setting ourselves up for failure by setting an unrealistic standard
that we try to achieve and never do, let's embrace the system that is
working for us today. We are authoritative for everything and guest
state is intimately tied to the virtual machine configuration.
"we are authoritative for everything" is a clean break from everything
that's being done today. It's also a clean break from the model of
central management plus database. We can't force it on people.
Non-volatile state is not intimately tied to configuration. We store
block device state completely outside the configuration. What's left is
the CD-ROM tray, CMOS memory, and network card EEPROM. We could argue
back and forth about where exactly they belong, but they aren't really
worth the conversation since they are meaningless for real-life use.
But beyond those races, QEMU is the only entity that knows with
certainty what bits of information are important to persist in order
to preserve a guest across shutdown/restart. The fact that we've
punted this problem for so long has only ensured that management
tools are either intrinsically broken or only support the most
minimal subset of functionality we actually support.
I'm not arguing about that. I just want to stress again the
difference between state and configuration. Qemu has no authority,
in my mind, as to configuration. Only state.
Being the one that creates a guest based on configuration, I would say
that we most certainly do.
That is not what being authoritative means.
In a virt-manager deployment, libvirt is the authoritative source of
guest configuration. In a RHEV-M deployment, the RHEV-M database is the
authoritative source of guest configuration. You can completely replace
the host machine and your guest will recreate just fine as long as the
authoritative source is intact.
Currently they contain the required guest configuration, a
representation of what's the current live configuration, and they
issue monitor commands to move the live configuration towards the
required configuration (or just generate a qemu command line).
What you're describing is completely different, I'm not even sure
what it is.
Management tools shouldn't have to think about how the monitor
commands they issue impact the invocation options of QEMU.
They have to, when creating a guest from scratch.
But I admit, this throws a new light (for me) on things. What's the
implications?
- must have a qemu instance running when editing configuration, even
when the guest is down
QMP is an API. Whether a qemu instance is launched is an
implementation detail. This could all be hidden completely with libqmp.
QMP is first and foremost a protocol.
- cannot add additional information to configuration; must store it
in an external database and cross-reference it with the qemu data
using the device ID
Don't confuse a management tool's notion of configuration with QEMU's
configuration.
A management tools config is used to initially create and then
manipulate an existing guest. If the management tool supports
out-of-band manipulation of a configuration file, then it needs to
determine how the configuration file changed and execute the
appropriate commands.
I wasn't talking about that. I was talking about data that is
meaningful to a user but not meaningful to qemu. That sort of data
doesn't store well if qemu is the authoritative source.
Yes, it is. libvirt kind of cheats here and just deletes the old VM
and creates a new one when editing the XML IIUC.
- no transactions/queries/etc except on non-authoritative source
- issues with shared-nothing design (well, can store the
configuration file using DRBD).
In both cases, today a management tool races with QEMU so both of
these points are currently true.
No, it doesn't. If the guest ejects a network card, the network card is
still there. Queries against the database still return correct results.
If you look at management tools, they believe they are the
authoritative source of configuration information (not guest state,
which is more or less ignored).
It's because we've given them no other option.
It's the natural way of doing it. You have a web interface that
talks to a database. When you want to list all VMs that have network
cards on the production subnet, you issue a database query and get a
recordset. How do you do that when the authoritative source of
information is spread across a cluster?
This problem still exists today. A guest can eject a network card on
it's own (without the management tool issuing a device_del command).
QEMU will delete the NIC when this happens.
I think that's a bug.
The same is true with CDROM eject.
CDROM tray position is state, not configuration.
Management tools are simply not authoritative today.
Regards,
Anthony Liguori
--
error compiling committee.c: too many arguments to function