** Description changed: [ Impact ] Virtualization users who: - Have a Noble VM on top of an Intel bare metal machine, and - Create a nested VM (guest) inside this Noble VM (host), and - Try to migrate this nested VM (guest) to another Noble VM (host) running on the same bare metal machine or a similar one, and - Use a "migratable XML" (as generated by "virsh dumpxml --migratable") as virsh's "--xml" and "--persistent-xml" arguments might encounter issues which prevent the migration from starting. These issues are related to CPU feature checks performed by libvirt, more specifically features related to "vmx*", which unfortunately have been a known source of problems in migration scenarios under libvirt. This bug also affects users who created the migratable XML file under Noble and are now trying to use it with the libvirt shipped in Oracular. [ Test Plan ] - Even though this problem happens only when using nested VMs with Intel - CPUs, it is still recommended to perform the following tests on a bare - metal machine also with an Intel CPU. In theory it should be possible - to reproduce this on a host using an AMD CPU, but you'd have to - explicitly tell LXD to create VMs with Intel CPUs. + This particular problem about VMX happens only when using nested VMs + with Intel CPUs, therefore it is recommended to perform the following + tests on a bare metal machine also with an Intel CPU. In theory it + could be possible to reproduce this on a host using an AMD CPU, but + you'd have to explicitly tell LXD to create VMs with virtual Intel CPUs + - but they'd likely refuse to expose all the intel virt features as + needed. Credits to Guillaume Boutry for providing scripts automating most of the reproduction steps. Let's create two Noble VMs using LXD: $ lxc launch ubuntu:noble --vm --config limits.cpu=4 --config limits.memory=8GiB -d root,size=80GiB libvirt-1 $ lxc launch ubuntu:noble --vm --config limits.cpu=4 --config limits.memory=8GiB -d root,size=80GiB libvirt-2 You will need to generate an SSH keypair for the "ubuntu" user on libvirt-1 and install the public key on libvirt-2 so that "ssh libvirt-2.lxd" works. The rest of this test plan assumes you have done that. Inside libvirt-1: # apt update # apt install -y libvirt-daemon-system uuid # echo "host_uuid = \"00000000-0000-0000-0000-$(printf "%012x" "${RANDOM}")\"" >> /etc/libvirt/libvirtd.conf # systemctl restart libvirtd.service # su - ubuntu $ cd /tmp $ wget http://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img $ sudo chown libvirt-qemu:kvm noble-server-cloudimg-amd64.img $ cd $ cat > domain.xml << _EOF_ <domain type="kvm"> <uuid>$(uuidgen)</uuid> <name>test-domain</name> <memory>1048576</memory> <vcpu>2</vcpu> <os> <type arch="x86_64" machine="pc">hvm</type> <boot dev="hd"/> </os> <features> <acpi/> <apic/> <vmcoreinfo/> </features> <clock offset="utc"> <timer name="pit" tickpolicy="delay"/> <timer name="rtc" tickpolicy="catchup"/> <timer name="hpet" present="no"/> </clock> <cpu mode="host-model" match="exact"> <topology sockets="2" cores="1" threads="1"/> </cpu> <devices> <disk type="file" device="disk"> <driver name="qemu" type="qcow2" cache="none"/> <source file="/tmp/noble-server-cloudimg-amd64.img"/> <target dev="vda" bus="virtio"/> </disk> <video> <model type="qxl"/> </video> <rng model="virtio"> <backend model="random">/dev/urandom</backend> </rng> <controller type="usb" index="0" model="none"/> <memballoon model="virtio"> <stats period="10"/> </memballoon> </devices> </domain> _EOF_ $ virsh define domain.xml $ virsh start test-domain $ virsh dumpxml --migratable test-domain > migratable.xml Inside libvirt-2: # apt update # apt install -y libvirt-daemon-system # cd /tmp # wget http://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img # chown libvirt-qemu:kvm noble-server-cloudimg-amd64.img # cd Now, back to libvirt-1, we are ready to test the migration: $ virsh migrate test-domain qemu+ssh://libvirt-2.lxd/system --live --persistent --undefinesource --copy-storage-inc --migrate-disks vda --persistent-xml migratable.xml --xml migratable.xml On Noble, you should see the following error: error: unsupported configuration: Target CPU feature count 28 does not match source 96 [ Where problems could occur ] As described below (in the "Other information" section), this SRU is different between Noble and Oracular. For Noble, the chances of regression are higher because it involves updating a sizeable patch series before actually backporting the patches to fix the bug. Feature-wise, this update should not change anything, and a review has been performed to make sure that, to the best of our knowledge, no user-facing changes are introduced. For Oracular, all that was needed to be done was backporting the patches that fix the issue. The patches themselves are not complex and have been part of RHEL's libvirt for a while now, without any regressions. There is always the possibility that some unwanted regression is introduced, but our internal migration testsuite has not caught any problems. [ Other information ] For Noble, this SRU involves two steps: 1) Updating an existing patch series (which was backported in order to fix bug #2051754). This is needed because the patch series was backported directly from the patches posted at upstream's mailing list. The series has since been accepted and pushed to the upstream git repository, and although it is exactly feature-wise, there were some minor cosmetic changes done to function names which can affect future backports that touch the same code (as is the case here). 2) Actually backporting the patches that fix the issue. Oracular was simpler because the patchset from step (1) was already present at the release. [ Original Description ] This is issue is reproduced consistently from the snap-openstack- hypervisor built from https://git.launchpad.net/ubuntu/+source/libvirt@ubuntu/noble-updates (with patches applied). When creating a nova instance, live migrating between two hosts always fails because of: error: unsupported configuration: Target CPU feature count 44 does not match source 109 Command that reproduces a Nova migration using libvirt client (and reproduces the same error): virsh migrate instance-00000002 qemu+tls://juju-596fd1-1.lxd/system --live --p2p --persistent --undefinesource --copy-storage-inc --migrate- disks vda --xml migratable.xml --persistent-xml migratable.xml --bandwidth 0 Attached to this bug you will find: - instance.xml: domain dumped through virsh - migratable.xml: domain drump through virsh using --migratable (same flags as nova updated xml) - libvirtd.log: libvirt daemon debug logs showcasing why it refused to migrate As you can see in the logs from libvirtd.log, the method virDomainDefCheckABIStabilityFlags fails because the src has 65 VMX additional features that are not found on the destination. (Both hypervisors are hosted on LXD VMs on the same physical machines i.e. same cpu flags)
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2083986 Title: Live migration fails because VMX features are missing on target cpu definition To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2083986/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs