That does sound like some sort of libvirt, then. I don't know why it would
fail to transfer with " unknown CPU feature" when the source VM XML is not
calling for it or a model that would include it.

On Sat, Dec 11, 2021 at 3:32 AM Wido den Hollander <w...@widodh.nl> wrote:

>
>
> Op 11-12-2021 om 00:52 schreef Marcus:
> > Just for clarity - Wido you mention that you tried using a common CPU
> model
> > across the platforms (which presumably doesn't contain npt) but migration
> > still fails on npt missing. That does seem like a bug of some sort, I
> would
> > expect that the the following should work:
> >
>
> Indeed, that failed.
>
> > * Update cloudstack agent configs to use 'EPYC-IBPB' common identical
> > model, restart agent
> > * Stop VM on source host (ubuntu 20.04)
> > * Start VM on source host (ubuntu 20.04) - at this point you should not
> > have a feature 'npt' in the XML of the running VM. If you do then there's
> > something wrong with the EPYC-IBPB or libvirt's interpretation
> > * Attempt to migrate to destination host (ubuntu 18.04)
> >
> > Is this process failing? Just want to ensure the source VM was restarted
> > and does not contain npt in the XML (and also on the resulting qemu
> command
> > line), but still the migration complains about missing that feature.
> >
>
> I tried with EPYC-IBPB as well and restarted the VM prior to the migration.
>
> 20.04 -> 18.04 fails even though the IBPB model in libvirt is exactly
> the same between 18 and 20.
>
> It complains about the npt feature lacking and thus the migration fails.
>
> > I'm also making an assumption here that /proc/cpuinfo on an Epyc 7552
> does
> > not have npt, but an Epyc 7662 does. Is that correct?
> >
>
> Correct.
>
> > On Tue, Dec 7, 2021 at 6:46 AM Gabriel Bräscher <gabrasc...@gmail.com>
> > wrote:
> >
> >> Paul, I confused the issues then.
> >>
> >> The one I mentioned fits only with what Wido reported in this thread.
> >> The CPU flag matches with the ones raised on that bug. Flags like *npt*
> &
> >> *nrip-save* which are present when SVM is enabled.
> >> Therefore, affected by kernel commit -- 52297436199d ("kvm: svm: Update
> >> svm_xsaves_supported").
> >> Additionally, the OS/Qemu versions also do fit with what is reported on
> >> Ubuntu' qemu package "bug #1887490".
> >>
> >> Regards
> >>
> >> On Tue, Dec 7, 2021 at 12:10 PM Paul Angus <p...@angus.uk.com.invalid>
> >> wrote:
> >>
> >>> The qemu-ev 2.10 bug was first reported a year or two ago in the
> mailing
> >>> lists.
> >>>
> >>> -----Original Message-----
> >>> From: Gabriel Bräscher <gabrasc...@gmail.com>
> >>> Sent: Tuesday, December 7, 2021 9:41 AM
> >>> To: dev <dev@cloudstack.apache.org>
> >>> Subject: Re: Live migration between AMD Epyc and Ubuntu 18.04 and 20.04
> >>>
> >>> Just adding to the "qemu-ev 2.10" & "qemu-ev 2.12" point.
> >>>
> >>>> migration fails from qemu-ev 2.10 to qemu-ev 2.12, this is definitely
> >>>> a bug in my point of view.
> >>>>
> >>>
> >>> On the comment 53 (at "bug #1887490"):
> >>>
> >>>> It seems *one of the patches also introduced a regression*:
> >>>> * lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patch
> >>>> adds various SVM-related flags. Specifically *npt and nrip-save are
> >>>> now expected to be present by default* as shown in the updated
> >> testdata.
> >>>> This however breaks migration from instances using EPYC or EPYC-IBPB
> >>>> CPU models started with libvirt versions prior to this one because the
> >>>> instance on the target host has these extra flags
> >>>
> >>>
> >>>  From the tests reported there, it fails in both ways.
> >>> 1. From *older* qemu package to *newer*:
> >>>      *source* host does not map the CPU flag; however, *target* host
> >>> expects the flag to be there, by default.
> >>> 2. From *newer* qemu package to *older*:
> >>>      the instance "domain.xml" in the *source* host has a CPU flag
> that is
> >>> not mapped by qemu in the *target* host.
> >>>
> >>>
> >>>
> >>> On Tue, Dec 7, 2021 at 10:22 AM Sven Vogel <s.vo...@ewerk.com> wrote:
> >>>
> >>>> Let me check. We had the same problem on RHEL/CentOS but I am not sure
> >>>> if this a bug. What I know there was a change in the XML. Let me ask
> >>>> one on my colleges in my team.
> >>>>
> >>>> &#128521;
> >>>>
> >>>>
> >>>> __
> >>>>
> >>>> Sven Vogel
> >>>> Senior Manager Research and Development - Cloud and Infrastructure
> >>>>
> >>>> EWERK DIGITAL GmbH
> >>>> Brühl 24, D-04109 Leipzig
> >>>> P +49 341 42649 - 99
> >>>> F +49 341 42649 - 98
> >>>> s.vo...@ewerk.com
> >>>> www.ewerk.com
> >>>>
> >>>> Geschäftsführer:
> >>>> Dr. Erik Wende, Hendrik Schubert, Tassilo Möschke
> >>>> Registergericht: Leipzig HRB 9065
> >>>>
> >>>> Support:
> >>>> +49 341 42649 555
> >>>>
> >>>> Zertifiziert nach:
> >>>> ISO/IEC 27001:2013
> >>>> DIN EN ISO 9001:2015
> >>>> DIN ISO/IEC 20000-1:2018
> >>>>
> >>>> ISAE 3402 Typ II Assessed
> >>>>
> >>>> EWERK-Blog<https://blog.ewerk.com/> | LinkedIn<
> >>>> https://www.linkedin.com/company/ewerk-group> | Xing<
> >>>> https://www.xing.com/company/ewerk> | Twitter<
> >>>> https://twitter.com/EWERK_Group> | Facebook<
> >>>> https://de-de.facebook.com/EWERK.Group/>
> >>>>
> >>>>
> >>>> Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.
> >>>>
> >>>> Disclaimer Privacy:
> >>>> Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien)
> >>>> ist vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht
> >>>> der bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung,
> >>>> Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte
> >>>> informieren Sie in diesem Fall unverzüglich den Absender und löschen
> >>>> Sie die E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem
> >>> System.
> >>>> Vielen Dank.
> >>>>
> >>>> The contents of this e-mail (including any attachments) are
> >>>> confidential and may be legally privileged. If you are not the
> >>>> intended recipient of this e-mail, any disclosure, copying,
> >>>> distribution or use of its contents is strictly prohibited, and you
> >>>> should please notify the sender immediately and then delete it
> >>> (including any attachments) from your system. Thank you.
> >>>> Von: Gabriel Bräscher <gabrasc...@gmail.com>
> >>>> Datum: Dienstag, 7. Dezember 2021 um 09:57
> >>>> An: dev <dev@cloudstack.apache.org>
> >>>> Betreff: Re: Live migration between AMD Epyc and Ubuntu 18.04 and
> >>>> 20.04 Wei, I agree.
> >>>> This is not necessarily a bug per se.
> >>>>
> >>>> The main point here is: the issue we are seeing is the "bug #1887490"
> >>>> raised in Ubuntu's qemu package.
> >>>> CPU features were added on the newer releases, which caused the
> >>>> compatibility issue when (live) migrating VMs between compatible
> >>>> hardware but different qemu packages.
> >>>>
> >>>>
> >>>> On Tue, Dec 7, 2021 at 9:26 AM Wei ZHOU <ustcweiz...@gmail.com>
> wrote:
> >>>>
> >>>>> Hi Gabriel,
> >>>>>
> >>>>> In my opinion, migration should work from lower version to higher
> >>>> version,
> >>>>> but no guarantee from higher version to lower version, like we
> >>>>> upgrade cloudstack.
> >>>>> Therefore, migrate should work from ubuntu 18.04 to ubuntu 20.04.
> >>>>> But it
> >>>> is
> >>>>> not a bug if migration fails from ubuntu 20.04 to ubuntu 18.04.
> >>>>>
> >>>>> As Paul said, migration fails from qemu-ev 2.10 to qemu-ev 2.12,
> >>>>> this is definitely a bug in my point of view.
> >>>>>
> >>>>> -Wei
> >>>>>
> >>>>> On Mon, 6 Dec 2021 at 16:05, Gabriel Bräscher <gabrasc...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Paul (& all),
> >>>>>>
> >>>>>> I strongly believe that this is a bug in QEMU.
> >>>>>> I was looking for bugs and found something that looks related to
> >>>>>> what
> >>>> we
> >>>>>> are seeing. Precisely at Ubuntu's bug #*1887490*
> >>>>>> <https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490>:
> >>>>>> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490
> >>>>>>
> >>>>>> In the link above, there was the following comment:
> >>>>>>
> >>>>
> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490/comments/5
> >>>> 3
> >>>>>>
> >>>>>> It seems one of the patches also introduced a regression:*
> >>>>>> lp-1887490-cpu_map-Add-missing-AMD-SVM-features.patchadds various
> >>>>>> SVM-related flags. Specifically npt and nrip-save are now expected
> >>>>>> to
> >>>> be
> >>>>>> present by default as shown in the updated testdata.This however
> >>>>>> breaks migration from instances using *EPYC* or *EPYC-IBPB* CPU
> >>>>>> models started with libvirt versions prior to this one because the
> >>>>>> instance on the
> >>>>> target
> >>>>>> host has these extra flags
> >>>>>>
> >>>>>>
> >>>>>> More about #*1887490*
> >>>>>> <https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887490> can
> >>>>>> be
> >>>>> found
> >>>>>> at the mail
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg5842376.html.
> >>>>>> We can see that the specific bug was addressed in "linux
> >>>>>> (5.4.0-49.53) focal".
> >>>>>>
> >>>>>> linux (5.4.0-49.53) focal; urgency=medium
> >>>>>>
> >>>>>>    * Add/Backport EPYC-v3 and EPYC-Rome CPU model (LP: #1887490)
> >>>>>>      - kvm: svm: Update svm_xsaves_supported
> >>>>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>> Gabriel.
> >>>>>>
> >>>>>> On Fri, Dec 3, 2021 at 10:59 AM Paul Angus <
> >>>> paul.an...@ticketmaster.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Which version(s) of QEMU are you using Wido?
> >>>>>>>
> >>>>>>> We've just be upgrading CentOS 7.6 to 7.9 Most 7.6 hosts had
> >>>>>>> qemu-ev 2.10 on it  (the buggy one). 2.12 was on
> >>>> the
> >>>>>>> new hosts.
> >>>>>>> We were getting errors complaining that the ibpb CPU feature
> >>>>>>> wasn't available when migrating to the updated OS hosts (even
> >>>>>>> though
> >>>> identical
> >>>>>>> hardware).
> >>>>>>>
> >>>>>>> Upgrading qemu-ev to 2.12 on the originating host, then stopping
> >>>>>>> and starting the VMs, then allowed us to migrate.  We couldn't
> >>>>>>> find any solution that didn't involve stopping and starting the
> >>> VMs.
> >>>>>>>
> >>>>>>> Paul.
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Wido den Hollander <w...@widodh.nl>
> >>>>>>> Sent: Monday, November 29, 2021 7:57 AM
> >>>>>>> To: dev@cloudstack.apache.org; Wei ZHOU <ustcweiz...@gmail.com>
> >>>>>>> Subject: Re: Live migration between AMD Epyc and Ubuntu 18.04
> >>>>>>> and
> >>>> 20.04
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 11/24/21 10:36 PM, Wei ZHOU wrote:
> >>>>>>>> Hi Wido,
> >>>>>>>>
> >>>>>>>> I think it is not good to run an environment with two
> >>>>>>>> ubuntu/qemu
> >>>>>>> versions.
> >>>>>>>> It always happens that some cpu features are supported in the
> >>>> higher
> >>>>>>>> version but not supported in the older version.
> >>>>>>>>  From my experience, the migration from older version to higher
> >>>>> version
> >>>>>>>> works like a charm, but there were many issues in migration
> >>>>>>>> from higher version to older version.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I understand. But with a large amount of hosts and working your
> >>>>>>> way through upgrades you sometimes run into these situations.
> >>>>>>> Therefor it
> >>>>>> would
> >>>>>>> be welcome if it works.
> >>>>>>>
> >>>>>>>> I do not have a solution for you. I have tried to hack
> >>>>>>>> /etc/libvirt/hooks/qemu but it didn't work.
> >>>>>>>> Have you tried with other cpu models like x86_Opteron_G5 ? you
> >>>>>>>> can find the cpu features of each cpu model in
> >>>>> /usr/share/libvirt/cpu_map/
> >>>>>>>>
> >>>>>>>
> >>>>>>> I have not tried that yet, but I can see if that works.
> >>>>>>>
> >>>>>>> The EPYC-IBPB CPU model is identical on 18.04 and 20.04, but
> >>>>>>> even
> >>>> using
> >>>>>>> that model we can't seem to migrate as it complains about the
> >> 'npt'
> >>>>>> feature.
> >>>>>>>
> >>>>>>> Wido
> >>>>>>>
> >>>>>>>> Anyway, even if the vm migration succeeds, you do not know if
> >>>>>>>> vm
> >>>>> works
> >>>>>>>> fine. I believe the best solution is upgrading all hosts to
> >>>>>>>> the
> >>>> same
> >>>>>>>> OS version.
> >>>>>>>>
> >>>>>>>> -Wei
> >>>>>>>>
> >>>>>>>> On Tue, 23 Nov 2021 at 16:31, Wido den Hollander
> >>>>>>>> <w...@widodh.nl>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I'm trying to debug an issue with live migrations between
> >>>>>>>>> Ubuntu
> >>>>>>>>> 18.04 and 20.04 machines each with different CPUs:
> >>>>>>>>>
> >>>>>>>>> - Ubuntu 18.04 with AMD Epyc 7552 (Rome)
> >>>>>>>>> - Ubuntu 20.04 with AMD Epyc 7662 (Milan)
> >>>>>>>>>
> >>>>>>>>> We are currently using this setting:
> >>>>>>>>>
> >>>>>>>>> guest.cpu.mode=custom
> >>>>>>>>> guest.cpu.model=EPYC
> >>>>>>>>>
> >>>>>>>>> This does not allow for live migrations:
> >>>>>>>>>
> >>>>>>>>> Ubuntu 20.04 with Epyc 7662 to Ubuntu 18.04 with Epyc 7552
> >>>>>>>>> fails
> >>>>>>>>>
> >>>>>>>>> "ExecutionException : org.libvirt.LibvirtException:
> >>>>>>>>> unsupported
> >>>>>>>>> configuration: unknown CPU feature: npt"
> >>>>>>>>>
> >>>>>>>>> So we tried to define a set of features manually:
> >>>>>>>>>
> >>>>>>>>> guest.cpu.features=3dnowprefetch abm adx aes apic arat avx
> >>>>>>>>> avx2
> >>>> bmi1
> >>>>>>>>> bmi2 clflush clflushopt cmov cr8legacy cx16 cx8 de f16c fma
> >>>>>>>>> fpu fsgsbase fxsr fxsr_opt lahf_lm lm mca mce misalignsse mmx
> >>>>>>>>> mmxext monitor movbe msr mtrr nx osvw pae pat pclmuldq
> >>>>>>>>> pdpe1gb pge pni popcnt pse pse36 rdrand rdseed rdtscp sep
> >>>>>>>>> sha-ni smap smep sse
> >>>> sse2
> >>>>>>>>> sse4.1 sse4.2 sse4a
> >>>>>>>>> ssse3 svm syscall tsc vme xgetbv1 xsave xsavec xsaveopt -npt
> >>>> -x2apic
> >>>>>>>>> -hypervisor -topoext -nrip-save
> >>>>>>>>>
> >>>>>>>>> This results in this going into the XML:
> >>>>>>>>>
> >>>>>>>>> <feature policy='disable' name='npt'/>
> >>>>>>>>>
> >>>>>>>>> You would say that works, but then the target host (18.04
> >>>>>>>>> with the
> >>>>>>>>> 7552) says it doesn't support the feature 'npt' and the
> >>>>>>>>> migration
> >>>>>> still
> >>>>>>> fails.
> >>>>>>>>>
> >>>>>>>>> Now we could ofcourse use the kvm64 CPU from Qemu, but that's
> >>>>> lacking
> >>>>>>>>> so many features that for example TLS offloading isn't
> >>> available.
> >>>>>>>>>
> >>>>>>>>> I also tried to set 'EPYC-Rome' on the Ubuntu 20.04
> >>>>>>>>> hypervisor,
> >>>> but
> >>>>>>>>> it then complains on the Ubuntu 18.04 hypervisor that the CPU
> >>>>>>> 'EPYC-Rome'
> >>>>>>>>> is unknown as the 18.04 hypervisor doesn't have that profile.
> >>>>>>>>>
> >>>>>>>>> Any ideas on how to get this working?
> >>>>>>>>>
> >>>>>>>>> Wido
> >>>>>>>>>
> >>>>>>>>
> >>>>>>> This message is confidential and may be legally privileged or
> >>>> otherwise
> >>>>>>> protected from disclosure. If you are not the intended
> >>>>>>> recipient,
> >>>>> please
> >>>>>>> telephone or email the sender and delete this message and any
> >>>>> attachment
> >>>>>>> from your system; you must not copy or disclose the contents of
> >>>>>>> this message or any attachment to any other person. We may
> >>>>>>> monitor email
> >>>>>> traffic
> >>>>>>> and the content of internal and external messages sent to and
> >>>>>>> from us
> >>>>> to
> >>>>>>> ensure compliance with internal policies and for the purposes of
> >>>>>> security.
> >>>>>>>
> >>>>>>> Ticketmaster UK Limited. Registered Office: 30 St John Street,
> >>>>>>> London
> >>>>>> EC1M
> >>>>>>> 4AY. Registered in England and Wales. Company Number 02662632.
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Reply via email to