On Thursday, July 4, 2024 2:04 AM, Peter Xu wrote: > On Wed, Jul 03, 2024 at 10:49:12PM +0800, Wei Wang wrote: > > When enforce_cpuid is set to false, the guest is launched with a > > filtered set of features, meaning that unsupported features by the > > host are removed from the guest's vCPU model. This could cause issues for > live migration. > > For example, a guest on the source is running with features A and B. > > If the destination host does not support feature B, the stub guest can > > still be launched on the destination with feature A only if > enforce_cpuid=false. > > Live migration can start in this case, though it may fail later when > > the states of feature B are put to the destination side. This failure > > occurs in the late stage (i.e., stop© phase) of the migration > > flow, where the source guest has already been paused. Tests show that > > in such cases the source guest does not recover, and the destination > > is unable to resume to run. > > > > Make "enfore_cpuid=true" a hard requirement for a guest to be > > migratable, and change the default value of "enforce_cpuid" to true, > > making the guest vCPUs migratable by default. If the destination stub > > guest has inconsistent CPUIDs (i.e., destination host cannot support > > the features defined by the guest's vCPU model), it fails to boot > > (with enfore_cpuid=true by default), thereby preventing migration from > > occuring. If enfore_cpuid=false is explicitly added for the guest, the > > guest is deemed as non-migratable (via the migration blocker), so the > > above issue won't occur as the guest won't be migrated. > > > > Tested-by: Lei Wang <lei4.w...@intel.com> > > Signed-off-by: Wei Wang <wei.w.w...@intel.com> > > [Copy Jiri and Dan for libvirt-side implications]
Thanks! > > > --- > > target/i386/cpu.c | 2 +- > > target/i386/kvm/kvm.c | 25 +++++++++++++++---------- > > 2 files changed, 16 insertions(+), 11 deletions(-) > > > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c index > > 4c2e6f3a71..7db4fe4ead 100644 > > --- a/target/i386/cpu.c > > +++ b/target/i386/cpu.c > > @@ -8258,7 +8258,7 @@ static Property x86_cpu_properties[] = { > > DEFINE_PROP_UINT32("hv-version-id-snumber", X86CPU, > > hyperv_ver_id_sn, 0), > > > > DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true), > > - DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false), > > + DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, true), > > I assume in many cases people can still properly migrate when the hosts are > similar or identical, so maybe we at least want the old machine types keep > working (by introducing a machine compat property)? You meant keeping "enforce_cpuid=false" for old machine types (e.g. before 9.1)? This will make them non-migratable with this patch, but they were migratable (by default) as "migratable" wasn't enforced by "enforce_cpuid". Should we keep them being migratable by default (e.g. enforce_cpuid=true) as well? > > > DEFINE_PROP_BOOL("x-force-features", X86CPU, force_features, false), > > DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true), > > DEFINE_PROP_UINT32("phys-bits", X86CPU, phys_bits, 0), diff --git > > a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index > > dd8b0f3313..aee717c1cf 100644 > > --- a/target/i386/kvm/kvm.c > > +++ b/target/i386/kvm/kvm.c > > @@ -1741,7 +1741,7 @@ static int hyperv_init_vcpu(X86CPU *cpu) > > return 0; > > } > > > > -static Error *invtsc_mig_blocker; > > +static Error *cpu_mig_blocker; > > > > #define KVM_MAX_CPUID_ENTRIES 100 > > > > @@ -2012,6 +2012,15 @@ full: > > abort(); > > } > > > > +static bool kvm_vcpu_need_block_migration(X86CPU *cpu) { > > + CPUX86State *env = &cpu->env; > > + > > + return !cpu->enforce_cpuid || > > + (!env->user_tsc_khz && (env->features[FEAT_8000_0007_EDX] & > > + CPUID_APM_INVTSC)); } > > Nit: maybe it's nice this returns a "const char*" with detailed reasons to be > put > into the error_setg(), so it dumps the same as before for the invtsc blocker. Sounds good. I'll check how to incorporate such info into the logs.