Reviewed: https://review.opendev.org/c/openstack/nova/+/777679 Committed: https://opendev.org/openstack/nova/commit/a569a51fedd058fdae2eb0066e087c37688987f8 Submitter: "Zuul (22348)" Branch: master
commit a569a51fedd058fdae2eb0066e087c37688987f8 Author: Sean Mooney <w...@seanmooney.info> Date: Fri May 21 14:45:45 2021 +0100 fix sr-iov support on Cavium ThunderX hosts. This change is a partial revert of Ibf8dca4bd57b3bddb39955b53cc03564506f5754 to reintoduce a try-except which is required for some non standard hardware. On the Cavium ThunderX platform, it's possible to have virutal functions which are netdevs which are not associated to a PF. This causes the PF name lookup to fail. Prior to Ibf8dca4bd57b3bddb39955b53cc03564506f5754 when the lookup failed it was caught and we skipped populating the parent PF interface name. This change restores that behavior. Closes-Bug: #1915255 Change-Id: Ia10ccdd9fbed3870d0592e3cbbff17f292651dd2 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1915255 Title: [Victoria] nova-compute won't start on aarch64 - raises PciDeviceNotFoundById Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) victoria series: Triaged Bug description: Description =========== When deploying OpenStack Victoria on Ubuntu 20.04 (Focal) on arm64/aarch64, nova-compute 22.0.1 fails to start with (nova- compute.log): ---------- Traceback (most recent call last): File "/usr/lib/python3/dist-packages/nova/pci/utils.py", line 156, in get_ifname_by_pci_address dev_info = os.listdir(dev_path) FileNotFoundError: [Errno 2] No such file or directory: '/sys/bus/pci/devices/0002:01:00.1/physfn/net' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 9823, in _update_available_resource_for_node self.rt.update_available_resource(context, nodename, File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 880, in update_available_resource resources = self.driver.get_available_resource(nodename) File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 8473, in get_available_resource data['pci_passthrough_devices'] = self._get_pci_passthrough_devices() File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7223, in _get_pci_passthrough_devices pci_info = [self._get_pcidev_info(name, dev, net_devs) for name, dev File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7223, in <listcomp> pci_info = [self._get_pcidev_info(name, dev, net_devs) for name, dev File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7199, in _get_pcidev_info device.update(_get_device_type(cfgdev, address, dev, net_devs)) File "/usr/lib/python3/dist-packages/nova/virt/libvirt/driver.py", line 7154, in _get_device_type parent_ifname = pci_utils.get_ifname_by_pci_address( File "/usr/lib/python3/dist-packages/nova/pci/utils.py", line 159, in get_ifname_by_pci_address raise exception.PciDeviceNotFoundById(id=pci_addr) nova.exception.PciDeviceNotFoundById: PCI device 0002:01:00.1 not found ---------- This results in an empty `openstack hypervisor list`. This does not happen with OpenStack Ussuri (nova-compute 21.1.0). We also haven't seen this on other architectures (yet?). This code actually appeared between Ussuri and Victoria, [0] i.e. the first version having it is 22.0.0. $ lspci | grep 0002:01:00.1 0002:01:00.1 Ethernet controller: Cavium, Inc. THUNDERX Network Interface Controller virtual function (rev 09) Indeed /sys/bus/pci/devices/0002:01:00.1/physfn/ doesn't contain `net` but I'm not sure if that's really a problem or if nova-compute should just catch the exception and move on? A similar issue in the past [1] shows that this might be an issue specific to the Cavium Thunder X NIC. Related issue: [2] Steps to reproduce ================== Install and run nova >= 22.0.0 on an aarch64 machine (with a Cavium Thunder X NIC if possible). I personally use Juju [3] for deploying an entire OpenStack Victoria setup to a lab: $ git clone https://github.com/openstack-charmers/openstack-bundles $ cd openstack-bundles/development/openstack-base-focal-victoria/ $ juju deploy ./bundle.yaml Expected result =============== `openstack hypervisor list` shows at least one hypervisor. nova-compute.log doesn't contain nova.exception.PciDeviceNotFoundById Actual result ============= `openstack hypervisor list` doesn't show any hypervisor. nova-compute.log contains nova.exception.PciDeviceNotFoundById Environment =========== $ dpkg -l | grep nova ii nova-api-metadata 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - metadata API frontend ii nova-common 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - common files ii nova-compute 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node base ii nova-compute-kvm 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node (KVM) ii nova-compute-libvirt 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support ii python3-nova 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute Python 3 libraries ii python3-novaclient 2:17.2.1-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x # cat /etc/nova/nova-compute.conf [DEFAULT] compute_driver=libvirt.LibvirtDriver [libvirt] virt_type=kvm $ dpkg -l | grep libvirt ii libvirt-clients 6.0.0-0ubuntu8.5 arm64 Programs for the libvirt library ii libvirt-daemon 6.0.0-0ubuntu8.5 arm64 Virtualization daemon ii libvirt-daemon-driver-qemu 6.0.0-0ubuntu8.5 arm64 Virtualization daemon QEMU connection driver ii libvirt-daemon-driver-storage-rbd 6.0.0-0ubuntu8.5 arm64 Virtualization daemon RBD storage driver ii libvirt-daemon-system 6.0.0-0ubuntu8.5 arm64 Libvirt daemon configuration files ii libvirt-daemon-system-systemd 6.0.0-0ubuntu8.5 arm64 Libvirt daemon configuration files (systemd) ii libvirt0:arm64 6.0.0-0ubuntu8.5 arm64 library for interfacing with different virtualization systems ii nova-compute-libvirt 2:22.0.1-0ubuntu1~cloud0 all OpenStack Compute - compute node libvirt support ii python3-libvirt 6.1.0-1 arm64 libvirt Python 3 bindings This shouldn't be relevant but: * Ceph 15.2.7 for storage * Neutron with OVN Logs & Configs ============== sosreport attached. [0] https://opendev.org/openstack/nova/commit/efc27ff84c3 [1] https://bugs.launchpad.net/charm-nova-compute/+bug/1771662 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1724999 [3] https://jaas.ai/openstack-base To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1915255/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp