Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

Avi Kivity Mon, 28 Feb 2011 00:41:18 -0800

On 02/27/2011 07:41 PM, Anthony Liguori wrote:

I agree 100% the management tool cannot be the authoritative sourceof state.
My position is:
- the management tool should be 100% in control of configuration (howthe guest is put together from its components)- qemu should be 100% in control of state (memory, disk state, NVRAMin various components, cd-rom eject state, explosive bolts forpayload separation, self-destruct mechanism, etc.)
There simply is not such a clean separation between the two becausethings that the guest does affects the configuration of the guest.
Hot plug,

I don't think hotunplug works this way. When the guest ejects the pcior usb device, it simply stops working with the device and disconnectsthe power. There is nothing non-volatile going on, no spring-loadedlever that pushes the device out. If the server reboots immediatelyafter hotunplug, but before the user physically removes the device, thenthe server will see the device when it boots up.

removable media eject,


Here, we do have a single bit of non-volatile storage.

persistent device settings (whether it's CMOS or EEPROM) all disruptthis model.

These are just arrays of bits, most of them with no standardinterpretation. So a block device fits them perfectly.

If you really wanted to have this separation, you'd have to be verystrict about making all guest settings not be specified in config.You would need to do:
qemu-img create -f e1000-eprom -o macaddr=12:23:45:67:78:90 e1000.0.rom
qemu-img create -f e1000-eprom -o macaddr=12:23:45:67:78:91 e1000.1.rom
qemu -device e1000,id=e1000.0,eeprom=e1000.0.rom -devicee1000,id=e1000.1,eeprom=e1000.1.rom
And now I need a tool that lets me modify e1000-eprom images if I wantto change the mac address dynamically (say I'm trying to clone a VM).
This type of model can be workable but as I said earlier, I think it'soverengineering the problem.

In fact I don't think anyone wants this. Usually management wants theassigned MAC to be used without the guest playing games with it. Soit's more or less pointless however it's implemented.

We don't separate configuration from guest state today. Instead ofsetting ourselves up for failure by setting an unrealistic standardthat we try to achieve and never do, let's embrace the system that isworking for us today. We are authoritative for everything and gueststate is intimately tied to the virtual machine configuration.

"we are authoritative for everything" is a clean break from everythingthat's being done today. It's also a clean break from the model ofcentral management plus database. We can't force it on people.

Non-volatile state is not intimately tied to configuration. We storeblock device state completely outside the configuration. What's left isthe CD-ROM tray, CMOS memory, and network card EEPROM. We could argueback and forth about where exactly they belong, but they aren't reallyworth the conversation since they are meaningless for real-life use.

But beyond those races, QEMU is the only entity that knows withcertainty what bits of information are important to persist in orderto preserve a guest across shutdown/restart. The fact that we'vepunted this problem for so long has only ensured that managementtools are either intrinsically broken or only support the mostminimal subset of functionality we actually support.
I'm not arguing about that. I just want to stress again thedifference between state and configuration. Qemu has no authority,in my mind, as to configuration. Only state.
Being the one that creates a guest based on configuration, I would saythat we most certainly do.


That is not what being authoritative means.

In a virt-manager deployment, libvirt is the authoritative source ofguest configuration. In a RHEV-M deployment, the RHEV-M database is theauthoritative source of guest configuration. You can completely replacethe host machine and your guest will recreate just fine as long as theauthoritative source is intact.

Currently they contain the required guest configuration, arepresentation of what's the current live configuration, and theyissue monitor commands to move the live configuration towards therequired configuration (or just generate a qemu command line).What you're describing is completely different, I'm not even surewhat it is.
Management tools shouldn't have to think about how the monitorcommands they issue impact the invocation options of QEMU.
They have to, when creating a guest from scratch.
But I admit, this throws a new light (for me) on things. What's theimplications?- must have a qemu instance running when editing configuration, evenwhen the guest is down
QMP is an API. Whether a qemu instance is launched is animplementation detail. This could all be hidden completely with libqmp.


QMP is first and foremost a protocol.

- cannot add additional information to configuration; must store itin an external database and cross-reference it with the qemu datausing the device ID
Don't confuse a management tool's notion of configuration with QEMU'sconfiguration.
A management tools config is used to initially create and thenmanipulate an existing guest. If the management tool supportsout-of-band manipulation of a configuration file, then it needs todetermine how the configuration file changed and execute theappropriate commands.

I wasn't talking about that. I was talking about data that ismeaningful to a user but not meaningful to qemu. That sort of datadoesn't store well if qemu is the authoritative source.

Yes, it is. libvirt kind of cheats here and just deletes the old VMand creates a new one when editing the XML IIUC.
- no transactions/queries/etc except on non-authoritative source
- issues with shared-nothing design (well, can store theconfiguration file using DRBD).
In both cases, today a management tool races with QEMU so both ofthese points are currently true.

No, it doesn't. If the guest ejects a network card, the network card isstill there. Queries against the database still return correct results.

If you look at management tools, they believe they are theauthoritative source of configuration information (not guest state,which is more or less ignored).
It's because we've given them no other option.
It's the natural way of doing it. You have a web interface thattalks to a database. When you want to list all VMs that have networkcards on the production subnet, you issue a database query and get arecordset. How do you do that when the authoritative source ofinformation is spread across a cluster?
This problem still exists today. A guest can eject a network card onit's own (without the management tool issuing a device_del command).QEMU will delete the NIC when this happens.


I think that's a bug.

The same is true with CDROM eject.


CDROM tray position is state, not configuration.


Management tools are simply not authoritative today.

Regards,

Anthony Liguori



--
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

Reply via email to