Reviewed: https://review.openstack.org/342301 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a61ae35d4b713f423219c7b714126e1584694e8 Submitter: Jenkins Branch: master
commit 3a61ae35d4b713f423219c7b714126e1584694e8 Author: Matt Riedemann <[email protected]> Date: Thu Jul 14 13:37:05 2016 -0400 Validate pci_passthrough_whitelist when starting n-cpu Loading up CONF.pci_passthrough_whitelist in the Whitelist object performs a bunch of validation and can fail in several different ways (invalid json, invalid values, invalid combinations of keys, devices not found, etc). This happens today when creating the PciDevTracker in the ResourceTracker when updating available resources. If the configuration is bad, it kills the periodic task to update available resources on the compute node. We should just load up the pci_passthrough_whitelist (if set) when starting the nova-compute service so we can fail fast and kill the service on any misconfiguration rather than run with a broken service. Change-Id: If50fb837b490042bb5ef20e9ad843b28f871a44e Closes-Bug: #1603034 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1603034 Title: pci whitelist exception will kill the periodic update of the hypervisor statistics Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) mitaka series: Confirmed Bug description: An encountered exception in the pci whitelist will cause the periodic hypervisor update loop to terminate and not be tried again. Retries should continue at the normal interval. Scenario 1: Update the nova.conf with the pci_whitelist as follows: pci_passthrough_whitelist = [ {"devname": "hed1", "physical_network": "physnet1"},{"physical_network": "physnet1", "address": "*:04:00.0"},{"physical_network": "physnet2", "address": "*:04:00.1"}] We get the following error in the nova compute log if hed1 is not present. But compute still shows up and the periodic hypervisor update stops working. 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager [req-0e7e62d5-23c9-48f2-8ca4-b47b763c29df None None] Error updating resources for node padawan-cp1-comp0001-mgmt. 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager Traceback (most recent call last): 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/manager.py", line 6472, in update_available_resource 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager rt.update_available_resource(context) 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 531, in update_available_resource 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._update_available_resource(context, resources) 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 271, in inner 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager return f(*args, **kwargs) 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 564, in _update_available_resource 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager node_id=n_id) 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/manager.py", line 68, in __init__ 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.dev_filter = whitelist.Whitelist(CONF.pci_passthrough_whitelist) 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 78, in __init__ 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self.specs = self._parse_white_list_from_config(whitelist_spec) 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/whitelist.py", line 59, in _parse_white_list_from_config 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager spec = devspec.PciDeviceSpec(ds) 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 134, in __init__ 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager self._init_dev_details() 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager File "/opt/stack/venv/nova-20160607T195234Z/lib/python2.7/site-packages/nova/pci/devspec.py", line 155, in _init_dev_details 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager raise exception.PciDeviceNotFoundById(id=self.dev_name) 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager PciDeviceNotFoundById: PCI device hed1 not found 2016-07-13 09:22:42.146 28800 ERROR nova.compute.manager To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1603034/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

