Reviewed: https://review.opendev.org/c/openstack/nova/+/784168 Committed: https://opendev.org/openstack/nova/commit/00f1d4757e503bb9807d7a8d7035c061a97db983 Submitter: "Zuul (22348)" Branch: master
commit 00f1d4757e503bb9807d7a8d7035c061a97db983 Author: Artom Lifshitz <alifs...@redhat.com> Date: Wed Mar 31 16:57:35 2021 -0400 Update SRIOV port pci_slot when unshelving There are a few things we need to do to make that work: * Always set the PCIRequest's requester_id. Previously, this was only done for resource requests. The requester_id is the port UUID, so we can use that to correlate which port to update with which pci_slot (in the case of multiple SRIOV ports per instance). This has the side effect of making the fix work only for instances created *after* this patch has been applied. It's not ideal, but there does not appear to be a better way. * Call setup_networks_on_host() within the instance_claim context. This means the instance's pci_devices are updated when we call it, allowing us to get the pci_slot information from them. With the two previous changes in place, we can figure out the port's new pci_slot in _update_port_binding_for_instance(). Closes: bug 1851545 Change-Id: Icfa8c1d6e84eab758af6223a2870078685584aaa ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1851545 Title: Port update exception on nova unshelve for instance with PCI devices (part 2) Status in OpenStack Compute (nova): Fix Released Bug description: Description =========== When unshelving an instance with PCI devices, and another instance is already using the PCI device(s) that the unshelved instance was initially scheduled with, we get an exception. Steps to reproduce ================== - Create instance with SR-IOV - Shelve instance - Unshelve instance on a compute node with the same PCI device(s) already in use Expected result =============== We should recalculate the pci mapping to use new PCI device(s) Actual result ============= Nova compute fails with this traceback [a]. This analysis was made when testing with newton, but it's the same problem with supported upstream, at least up to queens. - When we we have a failure, we see "Updating port 991cbd39-47f7-4cab-bf65-0c19a920a718 with attributes {'binding:host_id': 'xxx'}" which brings us here [1] - when we look below [2], we see that the pci devices are never recalculated and the profile is not updated with new devices when we unshelve because this only happens in case of a migration. - That brings us back to this commit [3] and this upstream bug [4] - I would assume that if we remove the "migration is not None" test, we will fail with this bug [4] because we get the pci_mapping from a migration object Now I'm not sure how to generate the pci_mapping without a migration object/context. [1] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2405-L2411 [2] https://github.com/openstack/nova/blob/newton-eol/nova/network/neutronv2/api.py#L2417-L2418 [3] https://github.com/openstack/nova/commit/70c1eb689ad174b61ad915ae5384778bd536c16c [4] https://bugs.launchpad.net/nova/+bug/1677621/ Logs & Configs ============== [a] ~~~ nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] Traceback (most recent call last): nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 4386, in _unshelve_instance nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] block_device_info=block_device_info) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2742, in spawn nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] destroy_disks_on_failure=True) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5121, in _create_domain_and_network nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] destroy_disks_on_failure) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] self.force_reraise() nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] six.reraise(self.type_, self.value, self.tb) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5093, in _create_domain_and_network nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] post_xml_callback=post_xml_callback) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5011, in _create_domain nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] guest.launch(pause=pause) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 144, in launch nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] self._encoded_xml, errors='ignore') nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] self.force_reraise() nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] six.reraise(self.type_, self.value, self.tb) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/guest.py", line 139, in launch nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] return self._domain.createWithFlags(flags) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] result = proxy_call(self._autowrap, f, *args, **kwargs) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] rv = execute(f, *args, **kwargs) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] six.reraise(c, e, tb) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] rv = meth(*args, **kwargs) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) nova-compute.log:2019-10-31 20:31:48.216 680184 ERROR nova.compute.manager [instance: 4fd6c244-238c-4e75-a856-3713163f4d17] libvirtError: Requested operation is not valid: PCI device 0000:5d:17.6 is in use by driver QEMU, domain instance-000024b0 ~~~ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1851545/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp