[openstack-dev] [nova] Intel NFV CI failing all shelve/unshelve tests

Jay Pipes Sun, 22 May 2016 16:47:49 -0700

Hello Novaites,

I've noticed that the Intel NFV CI has been failing all test runs forquite some time (at least a few days), always failing the same testsaround shelve/unshelve operations.

The shelve/unshelve Tempest tests always result in a timeout exceptionbeing raised, looking similar to the following, from [1]:

2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base Traceback(most recent call last):2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base File"tempest/api/compute/base.py", line 166, in server_check_teardown2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.basecls.server_id, 'ACTIVE')2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base File"tempest/common/waiters.py", line 95, in wait_for_server_status2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base raiseexceptions.TimeoutException(message)2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.baseTimeoutException: Request timed out2016-05-22 22:25:30.697 13974 ERROR tempest.api.compute.base Details:(ServerActionsTestJSON:tearDown) Servercae6fd47-0968-4922-a03e-3f2872e4eb52 failed to reach ACTIVE status andtask state "None" within the required time (196 s). Current status:SHELVED_OFFLOADED. Current task state: None.

I looked through the conductor and compute logs to see if I could findany possible reasons for the errors and found a number of the followingerrors in the compute logs:

2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] Traceback (most recent call last):2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] File"/opt/stack/new/nova/nova/compute/manager.py", line 4230, in_unshelve_instance2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] withrt.instance_claim(context, instance, limits):2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] File"/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py",line 271, in inner2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] return f(*args, **kwargs)2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] File"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 151, ininstance_claim2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52]self._update_usage_from_instance(context, instance_ref)2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] File"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 827, in_update_usage_from_instance2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] self._update_usage(instance,sign=sign)2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] File"/opt/stack/new/nova/nova/compute/resource_tracker.py", line 666, in_update_usage2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] self.compute_node, usage, free)2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] File"/opt/stack/new/nova/nova/virt/hardware.py", line 1482, inget_host_numa_usage_from_instance2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] host_numa_topology,instance_numa_topology, free=free))2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] File"/opt/stack/new/nova/nova/virt/hardware.py", line 1348, innuma_usage_from_instances2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] newcell.unpin_cpus(pinned_cpus)2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] File"/opt/stack/new/nova/nova/objects/numa.py", line 94, in unpin_cpus2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] pinned=list(self.pinned_cpus))2016-05-22 22:18:59.403 8145 ERROR nova.compute.manager [instance:cae6fd47-0968-4922-a03e-3f2872e4eb52] CPUPinningInvalid: Cannotpin/unpin cpus [6] from the following pinned set [0, 2, 4]


on or around the time of the failures in Tempest.

Perhaps tomorrow morning we can look into handling the above exceptionproperly from the compute manager, since clearly we shouldn't beallowing CPUPinningInvalid to be raised in the resource tracker's_update_usage() call....


Anyway, see you on IRC tomorrow morning and let's try to fix this.

Best,
-jay

[1]http://intel-openstack-ci-logs.ovh/86/319686/1/check/tempest-dsvm-full-nfv/b463722/testr_results.html.gz


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] Intel NFV CI failing all shelve/unshelve tests

Reply via email to