** Also affects: cloud-archive/yoga
   Importance: Undecided
       Status: New

** Also affects: cloud-archive/zed
   Importance: Undecided
       Status: New

** Changed in: cloud-archive/zed
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1972028

Title:
  _get_pci_passthrough_devices prone to race condition

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive yoga series:
  New
Status in Ubuntu Cloud Archive zed series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  At the moment, the `_get_pci_passthrough_devices` function is prone to
  race conditions.

  This specific code here calls `listCaps()`, however, it is possible
  that the device has disappeared by the time on method has been called:

  
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L7949-L7959

  Which would result in the following traceback:

  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 
[req-51b7c1c4-2b4a-46cc-9baa-8bf61801c48d - - - - -] Error updating resources 
for node <snip>.: libvirt.libvirtError: Node device not found: no node device 
with matching name 'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4'
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager Traceback (most 
recent call last):
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/manager.py", line 
9946, in _update_available_resource_for_node
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     
self.rt.update_available_resource(context, nodename,
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/compute/resource_tracker.py",
 line 879, in update_available_resource
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     resources = 
self.driver.get_available_resource(nodename)
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 8937, in get_available_resource
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     
data['pci_passthrough_devices'] = self._get_pci_passthrough_devices()
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 7663, in _get_pci_passthrough_devices
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     vdpa_devs = [
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/nova/virt/libvirt/driver.py", 
line 7664, in <listcomp>
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     dev for dev in 
devices.values() if "vdpa" in dev.listCaps()
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager   File 
"/var/lib/openstack/lib/python3.8/site-packages/libvirt.py", line 6276, in 
listCaps
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager     raise 
libvirtError('virNodeDeviceListCaps() failed')
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 
libvirt.libvirtError: Node device not found: no node device with matching name 
'net_tap8b08ec90_e5_fe_16_3e_0f_0a_d4'
  2022-05-06 20:16:16.110 4053032 ERROR nova.compute.manager 

  I think the cleaner way is to loop over all the items and skip a
  device if it raises an error that the device is not found.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1972028/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to