On Tue, 3 Nov 2020 15:33:56 +0000 Daniel P. Berrangé <berra...@redhat.com> wrote:
> On Tue, Nov 03, 2020 at 04:23:43PM +0100, Christophe de Dinechin wrote: > > > > On 2020-11-02 at 12:11 CET, Stefan Hajnoczi wrote... > > > There is discussion about VFIO migration in the "Re: Out-of-Process > > > Device Emulation session at KVM Forum 2020" thread. The current status > > > is that Kirti proposed a VFIO device region type for saving and loading > > > device state. There is currently no guidance on migrating between > > > different device versions or device implementations from different > > > vendors. This is known to be non-trivial and raised discussion about > > > whether it should really be handled by VFIO or centralized in QEMU. > > > > > > Below is a document that describes how to ensure migration compatibility > > > in VFIO. It does not require changes to the VFIO migration interface. It > > > can be used for both VFIO/mdev kernel devices and vfio-user devices. > > > > > > The idea is that the device state blob is opaque to the VMM but the same > > > level of migration compatibility that exists today is still available. > > > > > > I hope this will help us reach consensus and let us discuss specifics. > > > > > > If you followed the previous discussion, I changed the approach from > > > sending a magic constant in the device state blob to identifying device > > > models by URIs. Therefore the device state structure does not need to be > > > defined here - the critical information for ensuring device migration > > > compatibility is the device model and configuration defined below. > > > > > > Stefan > > > --- > > > VFIO Migration > > > ============== > > > This document describes how to save and load VFIO device states. Saving a > > > device state produces a snapshot of a VFIO device's state that can be > > > loaded > > > again at a later point in time to resume the device from the snapshot. > > > > > > The data representation of the device state is outside the scope of this > > > document. > > > > > > Overview > > > -------- > > > The purpose of device states is to save the device at a point in time and > > > then > > > restore the device back to the saved state later. This is more > > > challenging than > > > it first appears. > > > > > > The process of saving a device state and loading it later is called > > > *migration*. The state may be loaded by the same device that saved it or > > > by a > > > new instance of the device, possibly running on a different computer. > > > > > > It must be possible to migrate to a newer implementation of the device > > > as well as to an older implementation of the device. This allows users > > > to upgrade and roll back their systems. > > > > > > Migration can fail if loading the device state is not possible. It should > > > fail > > > early with a clear error message. It must not appear to complete but > > > leave the > > > device inoperable due to a migration problem. > > > > > > The rest of this document describes how these requirements can be met. > > > > > > Device Models > > > ------------- > > > Devices have a *hardware interface* consisting of hardware registers, > > > interrupts, and so on. > > > > > > The hardware interface together with the device state representation is > > > called > > > a *device model*. Device models can be assigned URIs such as > > > https://qemu.org/devices/e1000e to uniquely identify them. > > > > Like others, I think we should either > > > > a) Give a relatively strong requirement regarding what is at the URL in > > question, e.g. docs, maybe even a machine-readable schema describing > > configuration and state for the device. Leaving the option "there can be > > nothing here" is IMO asking for trouble. > > > > b) simply call that a unique ID, and then either drop the https: entirely or > > use something else, like pci:// or, to be more specific, vfio:// > > > > I'd favor option (b) for a different practical reason. URLs are subject to > > redirection and other mishaps. For example, using https:// begs the question > > whether > > https://qemu.org/devices/e1000e and > > https://www.qemu.org/devices/e1000e > > should be treated as the same device. I believe that your intent is that > > they shouldn't, but if the qemu web server redirects to www, and someone > > wants to copy-paste their web browser's URL bar to the command line, they'd > > get the wrong one. > > That's not a real world problem IMHO, because neither of these URLs > ever need resolve to a real webpage, and thus not need to be cut + > paste from a browser. > > They are simply expressing a resource identifier using a URI as a > convenient format. This is the same as an XML namespace using a URI, > and rarely, if ever, resolving to any actual web page. > > This is a good thing, because if you say there needs to be a real page > there, then it creates a pile of corporate beaurocracy for contributors. > I can freely create a URI under https://redhat.com for purposes of being > a identifier, but I cannot get any content published there without jumping > through many tedious corporate approvals and stand a good chance of being > rejected. > > If we're truely treating the URIs as an opaque string, we don't especially > need to define any rules other than to say it should be under a domain that > you have authority over either directly, or via membership of a project > that delegates. We can suggest "https" since seeing "http" is a red flag > for many people these days. Hmm, an opaque string, sort of like the existing "name" attribute we have now where Christophe quoted some examples in his message. Thanks, Alex