Public bug reported: We're getting a pretty increased failure rate with large-ops jobs in Nova, and it looks like it's around message timeouts in tearing down networks. This very distinctly started showing up this week in the race to merge code.
Example failure: http://logs.openstack.org/29/220229/3/check/gate- tempest-dsvm-large- ops/4b9bd8f/logs/screen-n-cpu-1.txt.gz#_2015-09-04_10_47_08_361 2015-09-04 10:47:08.226 ERROR nova.compute.manager [req-bd86a3c8-121b-43c9-ac02-c1d3dcdb2136 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Failed to deallocate network for instance. 2015-09-04 10:47:08.329 DEBUG nova.compute.manager [req-08f02702-32ba-44f2-aef3-269a0eff3550 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: 2999f5a3-746b-4c7a-9dcd-0f82c031f540] terminating bdm BlockDeviceMapping(boot_index=0,connection_info=None,created_at=2015-09-04T10:42:57Z,delete_on_termination=True,deleted=False,deleted_at=None,destination_type='local',device_name='/dev/sda',device_type='disk',disk_bus=None,guest_format=None,id=281,image_id='4850537c-18cc-438b-b154-1aafb3c36ec8',instance=<?>,instance_uuid=2999f5a3-746b-4c7a-9dcd-0f82c031f540,no_device=False,snapshot_id=None,source_type='image',updated_at=2015-09-04T10:43:54Z,volume_id=None,volume_size=None) _cleanup_volumes /opt/stack/new/nova/nova/compute/manager.py:2299 2015-09-04 10:47:08.361 ERROR nova.compute.manager [req-bd86a3c8-121b-43c9-ac02-c1d3dcdb2136 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Setting instance vm_state to ERROR 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Traceback (most recent call last): 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2396, in do_terminate_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._delete_instance(context, instance, bdms, quotas) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/hooks.py", line 149, in inner 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] rv = f(*args, **kwargs) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2375, in _delete_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] quotas.rollback() 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__ 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] six.reraise(self.type_, self.value, self.tb) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2338, in _delete_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._shutdown_instance(context, instance, bdms) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2265, in _shutdown_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._try_deallocate_network(context, instance, requested_networks) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2194, in _try_deallocate_network 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._set_instance_obj_error_state(context, instance) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__ 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] six.reraise(self.type_, self.value, self.tb) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2189, in _try_deallocate_network 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._deallocate_network(context, instance, requested_networks) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 1812, in _deallocate_network 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] context, instance, requested_networks=requested_networks) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/network/api.py", line 49, in wrapped 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] return func(self, context, *args, **kwargs) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/network/api.py", line 297, in deallocate_for_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] requested_networks=requested_networks) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/network/rpcapi.py", line 175, in deallocate_for_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] return cctxt.call(ctxt, 'deallocate_for_instance', **kwargs) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] retry=self.retry) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] timeout=timeout, retry=retry) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] retry=retry) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 420, in _send 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] result = self._waiter.wait(msg_id, timeout) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 318, in wait 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] message = self.waiters.get(msg_id, timeout=timeout) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 223, in get 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] 'to message ID %s' % msg_id) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] MessagingTimeout: Timed out waiting for a reply to message ID 4ce92e0a720b4f7fb1b26e1af1eff6ae 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] ** Affects: nova Importance: High Status: Confirmed ** Tags: gate ** Changed in: nova Importance: Undecided => High ** Changed in: nova Status: New => Confirmed ** Tags added: gate -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1492249 Title: regression in network performance causing large-ops jobs failures Status in OpenStack Compute (nova): Confirmed Bug description: We're getting a pretty increased failure rate with large-ops jobs in Nova, and it looks like it's around message timeouts in tearing down networks. This very distinctly started showing up this week in the race to merge code. Example failure: http://logs.openstack.org/29/220229/3/check/gate- tempest-dsvm-large- ops/4b9bd8f/logs/screen-n-cpu-1.txt.gz#_2015-09-04_10_47_08_361 2015-09-04 10:47:08.226 ERROR nova.compute.manager [req-bd86a3c8-121b-43c9-ac02-c1d3dcdb2136 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Failed to deallocate network for instance. 2015-09-04 10:47:08.329 DEBUG nova.compute.manager [req-08f02702-32ba-44f2-aef3-269a0eff3550 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: 2999f5a3-746b-4c7a-9dcd-0f82c031f540] terminating bdm BlockDeviceMapping(boot_index=0,connection_info=None,created_at=2015-09-04T10:42:57Z,delete_on_termination=True,deleted=False,deleted_at=None,destination_type='local',device_name='/dev/sda',device_type='disk',disk_bus=None,guest_format=None,id=281,image_id='4850537c-18cc-438b-b154-1aafb3c36ec8',instance=<?>,instance_uuid=2999f5a3-746b-4c7a-9dcd-0f82c031f540,no_device=False,snapshot_id=None,source_type='image',updated_at=2015-09-04T10:43:54Z,volume_id=None,volume_size=None) _cleanup_volumes /opt/stack/new/nova/nova/compute/manager.py:2299 2015-09-04 10:47:08.361 ERROR nova.compute.manager [req-bd86a3c8-121b-43c9-ac02-c1d3dcdb2136 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Setting instance vm_state to ERROR 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Traceback (most recent call last): 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2396, in do_terminate_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._delete_instance(context, instance, bdms, quotas) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/hooks.py", line 149, in inner 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] rv = f(*args, **kwargs) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2375, in _delete_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] quotas.rollback() 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__ 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] six.reraise(self.type_, self.value, self.tb) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2338, in _delete_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._shutdown_instance(context, instance, bdms) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2265, in _shutdown_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._try_deallocate_network(context, instance, requested_networks) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2194, in _try_deallocate_network 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._set_instance_obj_error_state(context, instance) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__ 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] six.reraise(self.type_, self.value, self.tb) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 2189, in _try_deallocate_network 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] self._deallocate_network(context, instance, requested_networks) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/compute/manager.py", line 1812, in _deallocate_network 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] context, instance, requested_networks=requested_networks) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/network/api.py", line 49, in wrapped 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] return func(self, context, *args, **kwargs) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/network/api.py", line 297, in deallocate_for_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] requested_networks=requested_networks) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/opt/stack/new/nova/nova/network/rpcapi.py", line 175, in deallocate_for_instance 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] return cctxt.call(ctxt, 'deallocate_for_instance', **kwargs) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] retry=self.retry) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] timeout=timeout, retry=retry) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] retry=retry) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 420, in _send 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] result = self._waiter.wait(msg_id, timeout) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 318, in wait 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] message = self.waiters.get(msg_id, timeout=timeout) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 223, in get 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] 'to message ID %s' % msg_id) 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] MessagingTimeout: Timed out waiting for a reply to message ID 4ce92e0a720b4f7fb1b26e1af1eff6ae 2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1492249/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

