On Tue, Nov 03, 2020 at 03:05:08PM +0000, Stefan Hajnoczi wrote: > On Tue, Nov 03, 2020 at 11:39:29AM +0000, Daniel P. Berrangé wrote: > > On Mon, Nov 02, 2020 at 11:11:53AM +0000, Stefan Hajnoczi wrote: > > > Overview > > > -------- > > > The purpose of device states is to save the device at a point in time and > > > then > > > restore the device back to the saved state later. This is more > > > challenging than > > > it first appears. > > > > > > The process of saving a device state and loading it later is called > > > *migration*. The state may be loaded by the same device that saved it or > > > by a > > > new instance of the device, possibly running on a different computer. > > > > > > It must be possible to migrate to a newer implementation of the device > > > as well as to an older implementation of the device. This allows users > > > to upgrade and roll back their systems. > > > > > > Migration can fail if loading the device state is not possible. It should > > > fail > > > early with a clear error message. It must not appear to complete but > > > leave the > > > device inoperable due to a migration problem. > > > > I think there needs to be an addition requirement. > > > > It must be possible for a management application to query the supported > > versions, independantly of execution of a migration operation. > > > > This is important to large scale data center / cloud management applications > > because before initiating a migration they need to *automatically* select > > a target host with high level of confidence that is will be compatible with > > the source host. > > > > Today QEMU migration compatibility is largely determined by the machine > > type version. Apps can query the supported machine types for host to > > check whether it is compatible. Similarly they will query CPU model > > features to check compatiblity. > > > > Validation and error checking at time of migration is of course still > > required, but the goal should be that an mgmt application will *NEVER* > > hit these errors because they will have pre-selected a host that is > > known to be compatible based on reported versions that are supported. > > Okay. What do you think of the following? > > [ > { > "model": "https://qemu.org/devices/e1000e", > "params": [ > "rss", > ...more configuration parameters... > ], > "versions": [ > { > "name": "1", > "params": [], > }, > { > "name": "2", > "params": ["rss=on"], > }, > ...more versions... > ] > }, > ...more device models... > ] > > The management tool can generate the configuration parameter list by > expanding a version into its params. > > Configuration parameter types and input ranges need more thought. For > example, version 1 of the device might not have rx-table-size (it's > effectively 0). Version 2 introduces rx-table-size and sets it to 32. > Version 3 raises the value to 64. In addition, the user can set a custom > value like rx-table-size=48. I haven't defined the rules for this yet, > but it's clear there needs to be a way to extend configuration > parameters. > > To check migration compatibility: > 1. Verify that the device model URL matches the JSON data[n].model > field. > 2. For every configuration parameter name from the source device, > check that it is contained within the JSON data[n].params list.
I'm not convinced that this makes sense. A matching set of parameter names + values does not imply that the migration data stream is actually compatible. ie implementations may need to change the internal migration data stream to fix bugs, without adding/removing a config parameter. The migration version string alone expresses data stream compatibility. This is similar to how 2 QEMU command lines can have identical set of configuration parameters, aside from the machine type version, and thus be migration *incompatible. Basically the version string should be considered an opaque blob that expresses compatibility on its own. > > > VFIO Implementation > > > ------------------- > > > The following applies both to kernel VFIO/mdev drivers and vfio-user > > > device > > > backends. > > > > > > Devices are instantiated based on a version and/or configuration > > > parameters: > > > * ``version=1`` - use the device configuration aliased by version 1 > > > * ``version=2,rx-filter-size=64`` - use version 1 and override > > > ``rx-filter-size`` > > > * ``rx-filter-size=0`` - directly set configuration parameters without > > > using a version > > > > > > Device creation fails if the version and/or configuration parameters are > > > not > > > supported. > > > > > > There must be a mechanism to query the "latest" configuration for a device > > > model. It may simply report the ``version=5`` where 5 is the latest > > > version but > > > it could also report all configuration parameters instead of using a > > > version > > > alias. > > > > The mechanism needs to be able to report all supported versions strings, > > not simple the latest version string. I think we need to specify the > > actual mechanism todo this query too, because we can't end up in a place > > where there's a different approach to queries for each device type. > > Makes sense. > > Stefan Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|