Reviewed: https://review.opendev.org/c/openstack/nova/+/909806 Committed: https://opendev.org/openstack/nova/commit/c1ccc1a3165ec1556c605b3b036274e992b0a09d Submitter: "Zuul (22348)" Branch: master
commit c1ccc1a3165ec1556c605b3b036274e992b0a09d Author: Artom Lifshitz <alifs...@redhat.com> Date: Wed Feb 21 19:58:32 2024 -0500 pwr mgmt: handle live migrations correctly Previously, live migrations completely ignored CPU power management. This patch makes sure that we correctly: * Power up the cores on the destination during pre_live_migration, as we need them powered up before the instance starts on the destination. * If the live migration is successful, power down the vacated cores on the source. * In case of a rollback, power down the cores previously powered up on pre_live_migration. Closes-bug: 2056613 Change-Id: I787bd7807950370cd865f29b95989d489d4826d0 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2056613 Title: libvirt CPU power management does not support live migration Status in OpenStack Compute (nova): Fix Released Bug description: Description =========== libvirt CPU power management does not support live migration Steps to reproduce ================== 1. Turn on libvirt CPU power management 2. Boot an instance with hw:cpu_policy=dedicated 3. Live migrate the instance Expected result =============== Live migration succeeds. Actual result ============= Live migration fails with the following libvirt error in the source nova-compute logs: [instance: afdd5e62-2a97-4b58-a7e7-bb92152f4165] Migration operation thread notification {{(pid=103809) thread_finished /opt/stack/nova/nova/virt/libvirt/driver.py:10668}} Feb 21 19:21:15.045216 np0036828692 nova-compute[103809]: Traceback (most recent call last): Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/hub.py", line 471, in fire_timers Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: timer() Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/timer.py", line 59, in __call__ Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: cb(*args, **kw) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/event.py", line 173, in _do_send Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: waiter.switch(result) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/greenthread.py", line 264, in main Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: result = function(*args, **kwargs) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/nova/nova/utils.py", line 664, in context_wrapper Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: return func(*args, **kwargs) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 10322, in _live_migration_operation Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: with excutils.save_and_reraise_exception(): Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_utils/excutils.py", line 227, in __exit__ Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: self.force_reraise() Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_utils/excutils.py", line 200, in force_reraise Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: raise self.value Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 10311, in _live_migration_operation Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: guest.migrate(self._live_migration_uri(dest), Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/nova/nova/virt/libvirt/guest.py", line 648, in migrate Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: self._domain.migrateToURI3( Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 186, in doit Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: result = proxy_call(self._autowrap, f, *args, **kwargs) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 144, in proxy_call Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: rv = execute(f, *args, **kwargs) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 125, in execute Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: raise e.with_traceback(tb) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/tpool.py", line 82, in tworker Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: rv = meth(*args, **kwargs) Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: File "/usr/lib/python3/dist-packages/libvirt.py", line 2126, in migrateToURI3 Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: raise libvirtError('virDomainMigrateToURI3() failed') Feb 21 19:21:15.045387 np0036828692 nova-compute[103809]: libvirt.libvirtError: cannot set CPU affinity on process 48279: Invalid argument Environment =========== This was originally noticed in a whitebox CI job [1] on devstack master. Additional info =============== Regardless of whether NUMA live migration has changed the underlying CPU pinnings, it's necessary to make sure the cores are powered up on the destination, otherwise libvirt attempts to pin the instance to an offline core. Nova doesn't handle that. With some refactoring to the code itself, it's possible to observe the cores not being powered on in functional tests. [1] https://zuul.opendev.org/t/openstack/build/532b30767df54147a01508e7616930f5/logs To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2056613/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp