* Liran Alon (liran.a...@oracle.com) wrote: > > > > On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilb...@redhat.com> wrote: > > > > * Liran Alon (liran.a...@oracle.com) wrote: > >> > >> > >>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilb...@redhat.com> > >>> wrote: > >>> > >>> * Liran Alon (liran.a...@oracle.com) wrote: > >>>> Hi, > >>>> > >>>> Looking at QEMU source code, I am puzzled regarding how migration > >>>> backwards compatibility is preserved regarding X86CPU. > >>>> > >>>> As I understand it, fields that are based on KVM capabilities and guest > >>>> runtime usage are defined in VMState subsections in order to not send > >>>> them if not necessary. > >>>> This is done such that in case they are not needed and we migrate to an > >>>> old QEMU which don’t support loading this state, migration will still > >>>> succeed > >>>> (As .needed() method will return false and therefore this state won’t be > >>>> sent as part of migration stream). > >>>> Furthermore, in case .needed() returns true and old QEMU don’t support > >>>> loading this state, migration fails. As it should because we are aware > >>>> that guest state > >>>> is not going to be restored properly on destination. > >>>> > >>>> I’m puzzled about what will happen in the following scenario: > >>>> 1) Source is running new QEMU with new KVM that supports save of some > >>>> VMState subsection. > >>>> 2) Destination is running new QEMU that supports load this state but > >>>> with old kernel that doesn’t know how to load this state. > >>>> > >>>> I would have expected in this case that if source .needed() returns > >>>> true, then migration will fail because of lack of support in destination > >>>> kernel. > >>>> However, it seems from current QEMU code that this will actually succeed > >>>> in many cases. > >>>> > >>>> For example, if msr_smi_count is sent as part of migration stream (See > >>>> vmstate_msr_smi_count) and destination have has_msr_smi_count==false, > >>>> then destination will succeed loading migration stream but > >>>> kvm_put_msrs() will actually ignore env->msr_smi_count and will > >>>> successfully load guest state. > >>>> Therefore, migration will succeed even though it should have failed… > >>>> > >>>> It seems to me that QEMU should have for every such VMState subsection, > >>>> a .post_load() method that verifies that relevant capability is > >>>> supported by kernel > >>>> and otherwise fail migration. > >>>> > >>>> What do you think? Should I really create a patch to modify all these > >>>> CPUX86 VMState subsections to behave like this? > >>> > >>> I don't know the x86 specific side that much; but from my migration side > >>> the answer should mostly be through machine types - indeed for smi-count > >>> there's a property 'x-migrate-smi-count' which is off for machine types > >>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old > >>> kernel you should stick to the old machine types. > >>> > >>> There's nothing guarding running the new machine type on old-kernels; > >>> and arguably we should have a check at startup that complains if > >>> your kernel is missing something the machine type uses. > >>> However, that would mean that people running with -M pc would fail > >>> on old kernels. > >>> > >>> A post-load is also a valid check; but one question is whether, > >>> for a particular register, the pain is worth it - it depends on the > >>> symptom that the missing state causes. If it's minor then you might > >>> conclude it's not worth a failed migration; if it's a hung or > >>> corrupt guest then yes it is. Certainly a warning printed is worth > >>> it. > >>> > >>> Dave > >> > >> I think we should have flags that allow user to specify which VMState > >> subsections user explicitly allow to avoid restore even though they are > >> required to fully restore guest state. > >> But it seems to me that the behaviour should be to always fail migration > >> in case we load a VMState subsections that we are unable to restore unless > >> user explicitly specified this is ok > >> for this specific subsection. > >> Therefore, it seems that for every VMState subsection that it’s restore is > >> based on kernel capability we should: > >> 1) Have a user-controllable flag (which is also tied to machine-type?) to > >> explicitly allow avoid restoring this state if cannot. Default should be > >> “false”. > >> 2) Have a .post_load() method that verifies we have required kernel > >> capability to restore this state, unless flag (1) was specified as “true”. > > > > This seems a lot of flags; users aren't going to know what to do with > > all of them; I don't see what will set/control them. > > True but I think users will want to specify only for a handful of VMState > subsections that it is OK to not restore them even thought hey are deemed > needed by source QEMU. > We can create flags only for those VMState subsections. > User should set these flags explicitly on QEMU command-line. As a “-cpu” > property? I don’t think these flags should be tied to machine-type.
I don't see who is going to work out these flags and send them. > > > >> Note that above mentioned flags is different than flags such as > >> “x-migrate-smi-count”. > >> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState > >> subsection to begin with in case we know we migrate to older QEMU which > >> don’t even have the relevant VMState subsection. But it is not relevant > >> for the case both source and destination runs QEMU which understands the > >> VMState subsection but run on kernels with different capabilities. > >> > >> Also note regarding your first paragraph, that specifying flags based on > >> kernel you are running on doesn’t help for the case discussed here. > >> As source QEMU is running on new kernel. Unless you meant that source QEMU > >> should use relevant machine-type based on the destination kernel. > >> i.e. You should launch QEMU with old machine-type as long as you have > >> hosts in your migration pool that runs with old kernel. > > > > That's what I meant; stick to the old machine-type unless you know it's > > safe to use a newer one. > > > >> I don’ think it’s the right approach though. As there is no way to change > >> flags such as “x-migrate-smi-count” dynamically after all hosts in > >> migration pool have been upgraded. > >> > >> What do you think? > > > > I don't have an easy answer. The users already have to make sure they > > use a machine type that's old enough for all the QEMUs installed in > > their cluster; making sure it's also old enough for their oldest > > kernel isn't too big a difference - *except* that it's much harder to > > tell which kernel corresponds to which feature/machine type etc - so > > how does a user know what the newest supported machine type is? > > Failing at startup when selecting a machine type that the current > > kernel can't support would help that. > > > > Dave > > First, machine-type express the set of vHW behaviour and properties that is > exposed to guest. > Therefore, machine-type shouldn’t change for a given guest lifetime > (including Live-Migrations). > Otherwise, guest will experience different vHW behaviour and properties > before/after Live-Migration. > So I think machine-type is not relevant for this discussion. We should focus > on flags which specify > migration behaviour (such as “x-migrate-smi-count” which can also be > controlled by machine-type but not only). Machine type specifies two things: a) The view from the guest b) Migration compatibility (b) is explicitly documented in qemu's docs/devel/migration.rst, see the subsection on subsections. > Second, this strategy results in inefficient migration management. Consider > the following scenario: > 1) Guest running on new_qemu+old_kernel migrate to host with > new_qemu+new_kernel. > Because source is old_kernel than destination QEMU is launched with > (x-migrate-smi-count == false). > 2) Assume at this point fleet of hosts have half of hosts with old_kernel and > half with new_kernel. > 3) Further assume that guest workload indeed use msr_smi_count and therefore > relevant VMState subsection should be sent to properly preserve guest state. > 4) From some reason, we decide to migrate again the guest in (1). > Even if guest is migrated to a host with new_kernel, then QEMU still avoids > sending msr_smi_count VMState subsection because it is launched with > (x-migrate-smi-count == false). > > Therefore, I think it makes more sense that source QEMU will always send all > VMState subsection that are deemed needed (i.e. .nedeed() returns true) > and let receive-side decide if migration should fail if this subsection was > sent but failed to be restored. > The only case which I think sender should limit the VMState subsection it > sends to destination is because source is running older QEMU > which is not even aware of this VMState subsection (Which is to my > understanding the rational behind using “x-migrate-smi-count” and tie it up > to machine-type). But we want to avoid failed migrations if we can; so in general we don't want to be sending subsections to destinations that can't handle them. The only case where it's reasonable is when there's a migration bug such that the behaviour in the guest is really nasty; if there's a choice between a failed migration or a hung/corrupt guest I'll take a failed migration. > Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I > modify all launched QEMUs on these new hosts to now have > “x-migrate-smi-count” set to true? > As I would like future migrations to do send this VMState subsection. > Currently there is no QMP command to update these flags. I guess that's possible - it's pretty painful though; you're going to have to teach your management layer about features/fixes of the kernels and which flags to tweak in qemu. Having said that, if you could do it, then you'd avoid having to restart VMs to pick up a few fixes. > Fourth, I think it’s not trivial for management-plane to be aware with which > flags it should set on destination QEMU based on currently running kernels on > fleet. > It’s not the same as machine-type, as already discussed above doesn’t change > during the entire lifetime of guest. Right, which is why I don't see your idea of adding flags will work. I don't see how anything will figure out what the right flags to use are. (Getting the management layers to do sane things with the cpuid flags is already a nightmare, and they're fairly well understood). > I’m also not sure it is a good idea that we currently control flags such as > “x-migrate-smi-count” from machine-type. > As it means that if a guest was initially launched using some old QEMU, it > will *forever* not migrate some VMState subsection during all it’s > Live-Migrations. > Even if all hosts and all QEMUs on fleet are capable of migrating this state > properly. > Maybe it is preferred that this flag was specified as part of “migrate” > command itself in case management-plane knows it wishes to migrate even > though dest QEMU > is older and doesn’t understand this specific VMState subsection. > > I’m left pretty confused about QEMU’s migration compatibility strategy... The compatibility strategy is the machine type; but yes it does have a problem when it's not really just a qemu version - but also kernel (and external libraries, etc). My general advice is that users should be updating their kernels and qemus together; but I realise there's lots of cases where that doesn't work. Dave > -Liran > > > > >> -Liran > >> > >>> > >>>> Thanks, > >>>> -Liran > >>> -- > >>> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > >> > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK