On Tue, Nov 03, 2020 at 11:39:29AM +0000, Daniel P. Berrangé wrote: > On Mon, Nov 02, 2020 at 11:11:53AM +0000, Stefan Hajnoczi wrote: > > Overview > > -------- > > The purpose of device states is to save the device at a point in time and > > then > > restore the device back to the saved state later. This is more challenging > > than > > it first appears. > > > > The process of saving a device state and loading it later is called > > *migration*. The state may be loaded by the same device that saved it or by > > a > > new instance of the device, possibly running on a different computer. > > > > It must be possible to migrate to a newer implementation of the device > > as well as to an older implementation of the device. This allows users > > to upgrade and roll back their systems. > > > > Migration can fail if loading the device state is not possible. It should > > fail > > early with a clear error message. It must not appear to complete but leave > > the > > device inoperable due to a migration problem. > > I think there needs to be an addition requirement. > > It must be possible for a management application to query the supported > versions, independantly of execution of a migration operation. > > This is important to large scale data center / cloud management applications > because before initiating a migration they need to *automatically* select > a target host with high level of confidence that is will be compatible with > the source host. > > Today QEMU migration compatibility is largely determined by the machine > type version. Apps can query the supported machine types for host to > check whether it is compatible. Similarly they will query CPU model > features to check compatiblity. > > Validation and error checking at time of migration is of course still > required, but the goal should be that an mgmt application will *NEVER* > hit these errors because they will have pre-selected a host that is > known to be compatible based on reported versions that are supported.
Okay. What do you think of the following? [ { "model": "https://qemu.org/devices/e1000e", "params": [ "rss", ...more configuration parameters... ], "versions": [ { "name": "1", "params": [], }, { "name": "2", "params": ["rss=on"], }, ...more versions... ] }, ...more device models... ] The management tool can generate the configuration parameter list by expanding a version into its params. Configuration parameter types and input ranges need more thought. For example, version 1 of the device might not have rx-table-size (it's effectively 0). Version 2 introduces rx-table-size and sets it to 32. Version 3 raises the value to 64. In addition, the user can set a custom value like rx-table-size=48. I haven't defined the rules for this yet, but it's clear there needs to be a way to extend configuration parameters. To check migration compatibility: 1. Verify that the device model URL matches the JSON data[n].model field. 2. For every configuration parameter name from the source device, check that it is contained within the JSON data[n].params list. > > VFIO Implementation > > ------------------- > > The following applies both to kernel VFIO/mdev drivers and vfio-user device > > backends. > > > > Devices are instantiated based on a version and/or configuration parameters: > > * ``version=1`` - use the device configuration aliased by version 1 > > * ``version=2,rx-filter-size=64`` - use version 1 and override > > ``rx-filter-size`` > > * ``rx-filter-size=0`` - directly set configuration parameters without > > using a version > > > > Device creation fails if the version and/or configuration parameters are not > > supported. > > > > There must be a mechanism to query the "latest" configuration for a device > > model. It may simply report the ``version=5`` where 5 is the latest version > > but > > it could also report all configuration parameters instead of using a version > > alias. > > The mechanism needs to be able to report all supported versions strings, > not simple the latest version string. I think we need to specify the > actual mechanism todo this query too, because we can't end up in a place > where there's a different approach to queries for each device type. Makes sense. Stefan
signature.asc
Description: PGP signature