Hello Novaites,

I've noticed that the Intel NFV CI has been failing all test runs for quite some time (at least a few days), always failing the same tests around shelve/unshelve operations.

The shelve/unshelve Tempest tests always result in a timeout exception being raised, looking similar to the following, from [1]:

2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base Traceback (most recent call last): 2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base File "tempest/api/compute/base.py", line 166, in server_check_teardown 2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base cls.server_id, 'ACTIVE') 2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base File "tempest/common/waiters.py", line 95, in wait_for_server_status 2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base raise exceptions.TimeoutException(message) 2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base TimeoutException: Request timed out 2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base Details: (ServerActionsTestJSON:tearDown) Server cae6fd47-0968-4922-a03e-3f2872e4eb52 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: SHELVED_OFFLOADED. Current task state: None.

I looked through the conductor and compute logs to see if I could find any possible reasons for the errors and found a number of the following errors in the compute logs:

2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] Traceback (most recent call last): 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] File "/opt/stack/new/nova/nova/compute/manager.py", line 4230, in _unshelve_instance 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] with rt.instance_claim(context, instance, limits): 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] return f(*args, **kwargs) 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 151, in instance_claim 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] self._update_usage_from_instance(context, instance_ref) 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 827, in _update_usage_from_instance 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] self._update_usage(instance, sign=sign) 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] File "/opt/stack/new/nova/nova/compute/resource_tracker.py", line 666, in _update_usage 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] self.compute_node, usage, free) 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] File "/opt/stack/new/nova/nova/virt/hardware.py", line 1482, in get_host_numa_usage_from_instance 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] host_numa_topology, instance_numa_topology, free=free)) 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] File "/opt/stack/new/nova/nova/virt/hardware.py", line 1348, in numa_usage_from_instances 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] newcell.unpin_cpus(pinned_cpus) 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] File "/opt/stack/new/nova/nova/objects/numa.py", line 94, in unpin_cpus 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] pinned=list(self.pinned_cpus)) 2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance: cae6fd47-0968-4922-a03e-3f2872e4eb52] CPUPinningInvalid: Cannot pin/unpin cpus [6] from the following pinned set [0, 2, 4]

on or around the time of the failures in Tempest.

Perhaps tomorrow morning we can look into handling the above exception properly from the compute manager, since clearly we shouldn't be allowing CPUPinningInvalid to be raised in the resource tracker's _update_usage() call....

Anyway, see you on IRC tomorrow morning and let's try to fix this.

Best,
-jay

[1] http://intel-openstack-ci-logs.ovh/86/319686/1/check/tempest-dsvm-full-nfv/b463722/testr_results.html.gz

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to