On 02/21/17 02:43, Michael S. Tsirkin wrote: > On Mon, Feb 20, 2017 at 09:55:40PM +0100, Laszlo Ersek wrote: >> On 02/20/17 21:45, Eric Blake wrote: >>> On 02/20/2017 02:19 PM, Dr. David Alan Gilbert wrote: >>>> * Eric Blake (ebl...@redhat.com) wrote: >>>>> On 02/20/2017 04:23 AM, Dr. David Alan Gilbert wrote: >>>>>> * Laszlo Ersek (ler...@redhat.com) wrote: >>>>>>> CC Dave >>>>>> >>>>>> This isn't an area I really understand; but if I'm >>>>>> reading this right then >>>>>> vmgenid is stored in fw_cfg? >>>>>> fw_cfg isn't migrated >>>>>> >>>>>> So why should any changes to it get migrated, except if it's already >>>>>> been read by the guest (and if the guest reads it again aftwards what's >>>>>> it expected to read?) >>>>> >>>>> Why are we expecting it to change on migration? You want a new value >>>> >>>> I'm not; I was asking why a change made prior to migration would be >>>> preserved across migration. >>> >>> Okay, so you're asking what happens if the source requests the vmgenid >>> device, and sets an id, but the destination of the migration does not >>> request anything >> >> This should never happen, as it means different QEMU command lines on >> source vs. target hosts. (Different as in "incorrectly different".) >> >> Dave writes, "a change made prior to migration". Change made to what? >> >> - the GUID cannot be changed via the monitor once QEMU has been started. >> We dropped the monitor command for that, due to lack of a good use case, >> and due to lifecycle complexities. We have figured out a way to make it >> safe, but until there's a really convincing use case, we shouldn't add >> that complexity. > > True but we might in the future, and it seems prudent to make > migration stream future-proof for that.
It is already. The monitor command, if we add it, can be implemented incrementally. I described it as "approach (iii)" elsewhere in the thread. This is a more detailed recap: - introduce a new device property (internal only), such as "x-enable-set-vmgenid". Make it reflect whether a given machine type supports the monitor command. - change the /etc/vmgenid_guid fw_cfg blob from callback-less to one with a selection callback - add a new boolean latch to the vmgenid device, called "guid_blob_selected" or something similar - the reset handler sets the latch to FALSE (NB: the reset handler already sets /etc/vmgenid_addr to zero) - the select callback for /etc/vmgenid_guid sets the latch to TRUE - the latch is added to the migration stream as a subsection *if* x-enable-set-vmgenid is TRUE - the set-vmgenid monitor command checks all three of: x-enable-set-vmgenid, the latch, and the contents of /etc/vmgenid_addr: - if x-enable-set-vmgenid is FALSE, the monitor command returns QERR_UNSUPPORTED (this is a generic error class, with an "unsupported" error message). Otherwise, - if the latch is TRUE *and* /etc/vmgenid_addr is zero, then the guest firmware has executed (or started executing) ALLOCATE for /etc/vmgenid_guid, but it has not executed WRITE_POINTER yet. In this case updating the VMGENID from the monitor is unsafe (we cannot guarantee informing the guest successfully), so in this case the monitor command fails with ERROR_CLASS_DEVICE_NOT_ACTIVE. The caller should simply try a bit later. (By which time the firmware will likely have programmed /etc/vmgenid_addr.) Libvirt can recognize this error specifically, because it is not the generic error class. ERROR_CLASS_DEVICE_NOT_ACTIVE stands for "EAGAIN", practically, in this case. - Otherwise -- meaning latch is FALSE *or* /etc/vmgenid_addr is nonzero, that is, the guest has either not run ALLOCATE since reset, *or* it has, but it has also run WRITE_POINTER): - refresh the GUID within the fw_cfg blob for /etc/vmgenid_guid in-place -- the guest will see this whenever it runs ALLOCATE for /etc/vmgenid_guid, *AND* - if /etc/vmgenid_addr is not zero, then update the guest (that is, RAM write + SCI) Thanks Laszlo > >> - the address of the GUID is changed (the firmware programs it from >> "zero" to an actual address, in a writeable fw_cfg file), and that piece >> of info is explicitly migrated, as part of the vmgenid device's vmsd. >> >> Thanks >> Laszlo >> >> >>> - how does the guest on the destination see the same id >>> as was in place on the source at the time migration started. >>> >>>> >>>> >>>>> when you load state from disk (you don't know how many times the same >>>>> state has been loaded previously, so each load is effectively forking >>>>> the VM and you want a different value), but for a single live migration, >>>>> you aren't forking the VM and don't need a new generation ID. >>>>> >>>>> I guess it all boils down to what command line you're using: if libvirt >>>>> is driving a live migration, it will request the same UUID in the >>>>> command line of the destination as what is on the source; while if >>>>> libvirt is loading from a [managed]save to restore state from a file, it >>>>> will either request a new UUID directly or request auto to let qemu >>>>> generate the new id. >>>> >>>> Hmm now I've lost it a bit; I thought we would preserve the value >>>> transmitted from the source, not the value on the command line of the >>>> destination. >>> >>> I guess I'm trying to figure out whether libvirt MUST read the current >>> id and explicitly tell the destination of migration to reuse that id, or >>> if libvirt can omit the id on migration and everything just works >>> because the id was migrated from the source. >>>