** Also affects: cloud-archive/yoga Importance: Undecided Status: New
** Also affects: cloud-archive/zed Importance: Undecided Status: New ** Changed in: cloud-archive/zed Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1972028 Title: _get_pci_passthrough_devices prone to race condition Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: Fix Released Status in OpenStack Compute (nova): Fix Released Bug description: At the moment, the `_get_pci_passthrough_devices` function is prone to race conditions. This specific code here calls `listCaps()`, however, it is possible that the device has disappeared by the time on method has been called: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7949-L7959 Which would result in the following traceback: 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager [req-51b7c1c4-2b4a-46cc-9baa-8bf61801c48d - - - - -] Error updating resources for node <snip>.: libvirt.libvirtError: Node device not found: no node device with matching name 'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4' 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager Traceback (most recent call last): 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 9946, in _update_available_resource_for_node 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager self.rt.update_available_resource(context, nodename, 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py", line 879, in update_available_resource 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager resources = self.driver.get_available_resource(nodename) 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 8937, in get_available_resource 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager data['pci_passthrough_devices'] = self._get_pci_passthrough_devices() 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 7663, in _get_pci_passthrough_devices 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager vdpa_devs = [ 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", line 7664, in <listcomp> 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager dev for dev in devices.values() if "vdpa" in dev.listCaps() 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager File "/var/lib/openstack/lib/python3.8/site-packages/libvirt.py", line 6276, in listCaps 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager raise libvirtError('virNodeDeviceListCaps() failed') 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager libvirt.libvirtError: Node device not found: no node device with matching name 'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4' 2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager I think the cleaner way is to loop over all the items and skip a device if it raises an error that the device is not found. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1972028/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp