That does sound like some sort of libvirt, then. I don't know why it would fail to transfer with " unknown CPU feature" when the source VM XML is not calling for it or a model that would include it.
On Sat, Dec 11, 2021 at 3:32 AM Wido den Hollander <w...@widodh.nl> wrote: > > > Op 11-12-2021 om 00:52 schreef Marcus: > > Just for clarity - Wido you mention that you tried using a common CPU > model > > across the platforms (which presumably doesn't contain npt) but migration > > still fails on npt missing. That does seem like a bug of some sort, I > would > > expect that the the following should work: > > > > Indeed, that failed. > > > * Update cloudstack agent configs to use 'EPYC-IBPB' common identical > > model, restart agent > > * Stop VM on source host (ubuntu 20.04) > > * Start VM on source host (ubuntu 20.04) - at this point you should not > > have a feature 'npt' in the XML of the running VM. If you do then there's > > something wrong with the EPYC-IBPB or libvirt's interpretation > > * Attempt to migrate to destination host (ubuntu 18.04) > > > > Is this process failing? Just want to ensure the source VM was restarted > > and does not contain npt in the XML (and also on the resulting qemu > command > > line), but still the migration complains about missing that feature. > > > > I tried with EPYC-IBPB as well and restarted the VM prior to the migration. > > 20.04 -> 18.04 fails even though the IBPB model in libvirt is exactly > the same between 18 and 20. > > It complains about the npt feature lacking and thus the migration fails. > > > I'm also making an assumption here that /proc/cpuinfo on an Epyc 7552 > does > > not have npt, but an Epyc 7662 does. Is that correct? > > > > Correct. > > > On Tue, Dec 7, 2021 at 6:46 AM Gabriel Bräscher <gabrasc...@gmail.com> > > wrote: > > > >> Paul, I confused the issues then. > >> > >> The one I mentioned fits only with what Wido reported in this thread. > >> The CPU flag matches with the ones raised on that bug. Flags like *npt* > & > >> *nrip-save* which are present when SVM is enabled. > >> Therefore, affected by kernel commit -- 52297436199d ("kvm: svm: Update > >> svm_xsaves_supported"). > >> Additionally, the OS/Qemu versions also do fit with what is reported on > >> Ubuntu' qemu package "bug #1887490". > >> > >> Regards > >> > >> On Tue, Dec 7, 2021 at 12:10 PM Paul Angus <p...@angus.uk.com.invalid> > >> wrote: > >> > >>> The qemu-ev 2.10 bug was first reported a year or two ago in the > mailing > >>> lists. > >>> > >>> -----Original Message----- > >>> From: Gabriel Bräscher <gabrasc...@gmail.com> > >>> Sent: Tuesday, December 7, 2021 9:41 AM > >>> To: dev <dev@cloudstack.apache.org> > >>> Subject: Re: Live migration between AMD Epyc and Ubuntu 18.04 and 20.04 > >>> > >>> Just adding to the "qemu-ev 2.10" & "qemu-ev 2.12" point. > >>> > >>>> migration fails from qemu-ev 2.10 to qemu-ev 2.12, this is definitely > >>>> a bug in my point of view. > >>>> > >>> > >>> On the comment 53 (at "bug #1887490"): > >>> > >>>> It seems *one of the patches also introduced a regression*: > >>>> * lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patch > >>>> adds various SVM-related flags. Specifically *npt and nrip-save are > >>>> now expected to be present by default* as shown in the updated > >> testdata. > >>>> This however breaks migration from instances using EPYC or EPYC-IBPB > >>>> CPU models started with libvirt versions prior to this one because the > >>>> instance on the target host has these extra flags > >>> > >>> > >>> From the tests reported there, it fails in both ways. > >>> 1. From *older* qemu package to *newer*: > >>> *source* host does not map the CPU flag; however, *target* host > >>> expects the flag to be there, by default. > >>> 2. From *newer* qemu package to *older*: > >>> the instance "domain.xml" in the *source* host has a CPU flag > that is > >>> not mapped by qemu in the *target* host. > >>> > >>> > >>> > >>> On Tue, Dec 7, 2021 at 10:22 AM Sven Vogel <s.vo...@ewerk.com> wrote: > >>> > >>>> Let me check. We had the same problem on RHEL/CentOS but I am not sure > >>>> if this a bug. What I know there was a change in the XML. Let me ask > >>>> one on my colleges in my team. > >>>> > >>>> 😉 > >>>> > >>>> > >>>> __ > >>>> > >>>> Sven Vogel > >>>> Senior Manager Research and Development - Cloud and Infrastructure > >>>> > >>>> EWERK DIGITAL GmbH > >>>> Brühl 24, D-04109 Leipzig > >>>> P +49 341 42649 - 99 > >>>> F +49 341 42649 - 98 > >>>> s.vo...@ewerk.com > >>>> www.ewerk.com > >>>> > >>>> Geschäftsführer: > >>>> Dr. Erik Wende, Hendrik Schubert, Tassilo Möschke > >>>> Registergericht: Leipzig HRB 9065 > >>>> > >>>> Support: > >>>> +49 341 42649 555 > >>>> > >>>> Zertifiziert nach: > >>>> ISO/IEC 27001:2013 > >>>> DIN EN ISO 9001:2015 > >>>> DIN ISO/IEC 20000-1:2018 > >>>> > >>>> ISAE 3402 Typ II Assessed > >>>> > >>>> EWERK-Blog<https://blog.ewerk.com/> | LinkedIn< > >>>> https://www.linkedin.com/company/ewerk-group> | Xing< > >>>> https://www.xing.com/company/ewerk> | Twitter< > >>>> https://twitter.com/EWERK_Group> | Facebook< > >>>> https://de-de.facebook.com/EWERK.Group/> > >>>> > >>>> > >>>> Auskünfte und Angebote per Mail sind freibleibend und unverbindlich. > >>>> > >>>> Disclaimer Privacy: > >>>> Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien) > >>>> ist vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht > >>>> der bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung, > >>>> Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte > >>>> informieren Sie in diesem Fall unverzüglich den Absender und löschen > >>>> Sie die E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem > >>> System. > >>>> Vielen Dank. > >>>> > >>>> The contents of this e-mail (including any attachments) are > >>>> confidential and may be legally privileged. If you are not the > >>>> intended recipient of this e-mail, any disclosure, copying, > >>>> distribution or use of its contents is strictly prohibited, and you > >>>> should please notify the sender immediately and then delete it > >>> (including any attachments) from your system. Thank you. > >>>> Von: Gabriel Bräscher <gabrasc...@gmail.com> > >>>> Datum: Dienstag, 7. Dezember 2021 um 09:57 > >>>> An: dev <dev@cloudstack.apache.org> > >>>> Betreff: Re: Live migration between AMD Epyc and Ubuntu 18.04 and > >>>> 20.04 Wei, I agree. > >>>> This is not necessarily a bug per se. > >>>> > >>>> The main point here is: the issue we are seeing is the "bug #1887490" > >>>> raised in Ubuntu's qemu package. > >>>> CPU features were added on the newer releases, which caused the > >>>> compatibility issue when (live) migrating VMs between compatible > >>>> hardware but different qemu packages. > >>>> > >>>> > >>>> On Tue, Dec 7, 2021 at 9:26 AM Wei ZHOU <ustcweiz...@gmail.com> > wrote: > >>>> > >>>>> Hi Gabriel, > >>>>> > >>>>> In my opinion, migration should work from lower version to higher > >>>> version, > >>>>> but no guarantee from higher version to lower version, like we > >>>>> upgrade cloudstack. > >>>>> Therefore, migrate should work from ubuntu 18.04 to ubuntu 20.04. > >>>>> But it > >>>> is > >>>>> not a bug if migration fails from ubuntu 20.04 to ubuntu 18.04. > >>>>> > >>>>> As Paul said, migration fails from qemu-ev 2.10 to qemu-ev 2.12, > >>>>> this is definitely a bug in my point of view. > >>>>> > >>>>> -Wei > >>>>> > >>>>> On Mon, 6 Dec 2021 at 16:05, Gabriel Bräscher <gabrasc...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> Hi Paul (& all), > >>>>>> > >>>>>> I strongly believe that this is a bug in QEMU. > >>>>>> I was looking for bugs and found something that looks related to > >>>>>> what > >>>> we > >>>>>> are seeing. Precisely at Ubuntu's bug #*1887490* > >>>>>> <https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490>: > >>>>>> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490 > >>>>>> > >>>>>> In the link above, there was the following comment: > >>>>>> > >>>> > https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490/comments/5 > >>>> 3 > >>>>>> > >>>>>> It seems one of the patches also introduced a regression:* > >>>>>> lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patchadds various > >>>>>> SVM-related flags. Specifically npt and nrip-save are now expected > >>>>>> to > >>>> be > >>>>>> present by default as shown in the updated testdata.This however > >>>>>> breaks migration from instances using *EPYC* or *EPYC-IBPB* CPU > >>>>>> models started with libvirt versions prior to this one because the > >>>>>> instance on the > >>>>> target > >>>>>> host has these extra flags > >>>>>> > >>>>>> > >>>>>> More about #*1887490* > >>>>>> <https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490> can > >>>>>> be > >>>>> found > >>>>>> at the mail > >>>>>> > >>>>> > >>>> > >>> > >> > https://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg5842376.html. > >>>>>> We can see that the specific bug was addressed in "linux > >>>>>> (5.4.0-49.53) focal". > >>>>>> > >>>>>> linux (5.4.0-49.53) focal; urgency=medium > >>>>>> > >>>>>> * Add/Backport EPYC-v3 and EPYC-Rome CPU model (LP: #1887490) > >>>>>> - kvm: svm: Update svm_xsaves_supported > >>>>>> > >>>>>> > >>>>>> Regards, > >>>>>> Gabriel. > >>>>>> > >>>>>> On Fri, Dec 3, 2021 at 10:59 AM Paul Angus < > >>>> paul.an...@ticketmaster.com> > >>>>>> wrote: > >>>>>> > >>>>>>> Which version(s) of QEMU are you using Wido? > >>>>>>> > >>>>>>> We've just be upgrading CentOS 7.6 to 7.9 Most 7.6 hosts had > >>>>>>> qemu-ev 2.10 on it (the buggy one). 2.12 was on > >>>> the > >>>>>>> new hosts. > >>>>>>> We were getting errors complaining that the ibpb CPU feature > >>>>>>> wasn't available when migrating to the updated OS hosts (even > >>>>>>> though > >>>> identical > >>>>>>> hardware). > >>>>>>> > >>>>>>> Upgrading qemu-ev to 2.12 on the originating host, then stopping > >>>>>>> and starting the VMs, then allowed us to migrate. We couldn't > >>>>>>> find any solution that didn't involve stopping and starting the > >>> VMs. > >>>>>>> > >>>>>>> Paul. > >>>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Wido den Hollander <w...@widodh.nl> > >>>>>>> Sent: Monday, November 29, 2021 7:57 AM > >>>>>>> To: dev@cloudstack.apache.org; Wei ZHOU <ustcweiz...@gmail.com> > >>>>>>> Subject: Re: Live migration between AMD Epyc and Ubuntu 18.04 > >>>>>>> and > >>>> 20.04 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 11/24/21 10:36 PM, Wei ZHOU wrote: > >>>>>>>> Hi Wido, > >>>>>>>> > >>>>>>>> I think it is not good to run an environment with two > >>>>>>>> ubuntu/qemu > >>>>>>> versions. > >>>>>>>> It always happens that some cpu features are supported in the > >>>> higher > >>>>>>>> version but not supported in the older version. > >>>>>>>> From my experience, the migration from older version to higher > >>>>> version > >>>>>>>> works like a charm, but there were many issues in migration > >>>>>>>> from higher version to older version. > >>>>>>>> > >>>>>>> > >>>>>>> I understand. But with a large amount of hosts and working your > >>>>>>> way through upgrades you sometimes run into these situations. > >>>>>>> Therefor it > >>>>>> would > >>>>>>> be welcome if it works. > >>>>>>> > >>>>>>>> I do not have a solution for you. I have tried to hack > >>>>>>>> /etc/libvirt/hooks/qemu but it didn't work. > >>>>>>>> Have you tried with other cpu models like x86_Opteron_G5 ? you > >>>>>>>> can find the cpu features of each cpu model in > >>>>> /usr/share/libvirt/cpu_map/ > >>>>>>>> > >>>>>>> > >>>>>>> I have not tried that yet, but I can see if that works. > >>>>>>> > >>>>>>> The EPYC-IBPB CPU model is identical on 18.04 and 20.04, but > >>>>>>> even > >>>> using > >>>>>>> that model we can't seem to migrate as it complains about the > >> 'npt' > >>>>>> feature. > >>>>>>> > >>>>>>> Wido > >>>>>>> > >>>>>>>> Anyway, even if the vm migration succeeds, you do not know if > >>>>>>>> vm > >>>>> works > >>>>>>>> fine. I believe the best solution is upgrading all hosts to > >>>>>>>> the > >>>> same > >>>>>>>> OS version. > >>>>>>>> > >>>>>>>> -Wei > >>>>>>>> > >>>>>>>> On Tue, 23 Nov 2021 at 16:31, Wido den Hollander > >>>>>>>> <w...@widodh.nl> > >>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I'm trying to debug an issue with live migrations between > >>>>>>>>> Ubuntu > >>>>>>>>> 18.04 and 20.04 machines each with different CPUs: > >>>>>>>>> > >>>>>>>>> - Ubuntu 18.04 with AMD Epyc 7552 (Rome) > >>>>>>>>> - Ubuntu 20.04 with AMD Epyc 7662 (Milan) > >>>>>>>>> > >>>>>>>>> We are currently using this setting: > >>>>>>>>> > >>>>>>>>> guest.cpu.mode=custom > >>>>>>>>> guest.cpu.model=EPYC > >>>>>>>>> > >>>>>>>>> This does not allow for live migrations: > >>>>>>>>> > >>>>>>>>> Ubuntu 20.04 with Epyc 7662 to Ubuntu 18.04 with Epyc 7552 > >>>>>>>>> fails > >>>>>>>>> > >>>>>>>>> "ExecutionException : org.libvirt.LibvirtException: > >>>>>>>>> unsupported > >>>>>>>>> configuration: unknown CPU feature: npt" > >>>>>>>>> > >>>>>>>>> So we tried to define a set of features manually: > >>>>>>>>> > >>>>>>>>> guest.cpu.features=3dnowprefetch abm adx aes apic arat avx > >>>>>>>>> avx2 > >>>> bmi1 > >>>>>>>>> bmi2 clflush clflushopt cmov cr8legacy cx16 cx8 de f16c fma > >>>>>>>>> fpu fsgsbase fxsr fxsr_opt lahf_lm lm mca mce misalignsse mmx > >>>>>>>>> mmxext monitor movbe msr mtrr nx osvw pae pat pclmuldq > >>>>>>>>> pdpe1gb pge pni popcnt pse pse36 rdrand rdseed rdtscp sep > >>>>>>>>> sha-ni smap smep sse > >>>> sse2 > >>>>>>>>> sse4.1 sse4.2 sse4a > >>>>>>>>> ssse3 svm syscall tsc vme xgetbv1 xsave xsavec xsaveopt -npt > >>>> -x2apic > >>>>>>>>> -hypervisor -topoext -nrip-save > >>>>>>>>> > >>>>>>>>> This results in this going into the XML: > >>>>>>>>> > >>>>>>>>> <feature policy='disable' name='npt'/> > >>>>>>>>> > >>>>>>>>> You would say that works, but then the target host (18.04 > >>>>>>>>> with the > >>>>>>>>> 7552) says it doesn't support the feature 'npt' and the > >>>>>>>>> migration > >>>>>> still > >>>>>>> fails. > >>>>>>>>> > >>>>>>>>> Now we could ofcourse use the kvm64 CPU from Qemu, but that's > >>>>> lacking > >>>>>>>>> so many features that for example TLS offloading isn't > >>> available. > >>>>>>>>> > >>>>>>>>> I also tried to set 'EPYC-Rome' on the Ubuntu 20.04 > >>>>>>>>> hypervisor, > >>>> but > >>>>>>>>> it then complains on the Ubuntu 18.04 hypervisor that the CPU > >>>>>>> 'EPYC-Rome' > >>>>>>>>> is unknown as the 18.04 hypervisor doesn't have that profile. > >>>>>>>>> > >>>>>>>>> Any ideas on how to get this working? > >>>>>>>>> > >>>>>>>>> Wido > >>>>>>>>> > >>>>>>>> > >>>>>>> This message is confidential and may be legally privileged or > >>>> otherwise > >>>>>>> protected from disclosure. If you are not the intended > >>>>>>> recipient, > >>>>> please > >>>>>>> telephone or email the sender and delete this message and any > >>>>> attachment > >>>>>>> from your system; you must not copy or disclose the contents of > >>>>>>> this message or any attachment to any other person. We may > >>>>>>> monitor email > >>>>>> traffic > >>>>>>> and the content of internal and external messages sent to and > >>>>>>> from us > >>>>> to > >>>>>>> ensure compliance with internal policies and for the purposes of > >>>>>> security. > >>>>>>> > >>>>>>> Ticketmaster UK Limited. Registered Office: 30 St John Street, > >>>>>>> London > >>>>>> EC1M > >>>>>>> 4AY. Registered in England and Wales. Company Number 02662632. > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > > >