Reviewed: https://review.openstack.org/550865 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1e77faaa412ab9909dd9491cab4a819b5c84d3e8 Submitter: Zuul Branch: master
commit 1e77faaa412ab9909dd9491cab4a819b5c84d3e8 Author: Eric M Gonzalez <[email protected]> Date: Thu Mar 8 09:11:25 2018 -0600 unquiesce instance after quiesce failure If the call to compute_rpcapi.quisece_instance() raises an exception, any uncaught exception will break out of the function snapshot_volume_backed(). This can leave the instance in frozen state. This patch adds a blanket Exception catch to the try block and calls compute_rpcapi.unquiesce_instance() before reraising. This has been seen in the wild with RPC timeouts, but this is not the only possible genesis for an unknown error from quiesce_instance. Change-Id: Idca5998da8bb42b29a8fffdf52b4af3a043c6326 Closes-Bug: #1754360 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1754360 Title: no unquiesce for volume backed on quiesce failure Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Confirmed Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: In Progress Bug description: Extension of bug #1731986; The above bug and fix catches errors that occur during the snapshot of an instance's volumes. I later discovered that a failure can occur during the call to quisce_instance() that raises an uncaught Exceptions through snapshot_volume_backed() that can leave the instance frozen / quiesced. Replication is tricky; my failures result during the RPC call to the compute host and a MessagingTimeout waiting for a reply. I have not found a way to handily replicate this. My compute combination is: Nova Mitaka, Libvirt-1.3.1, & Ceph Jewel Similar to the above bug, this condition was discovered in Mitaka and the issue remains in Queens. My proposed patch adds a blanket Exception catch around the call to rpcapi.quiesce_instance(), logs the caught exception, and issues an immediate rpcapi.unquiesce_instance() in order to thaw the instance. Stack trace from nova-api-os container, responsible for quiesce / unquiesce of instance during snapshot: [req-6229d689-dcc3-41ca-99b5-3dfc04e1e994 50505ffa89754660b4e6f7ebf69532b5 24bfcdab70714b85b5cb9f5f8270a414 - - -] Unexpected exception in API method Traceback (most recent call last): File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/api/openstack/extensions.py", line 478, in wrapped return f(*args, **kwargs) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/api/openstack/common.py", line 391, in inner return f(*args, **kwargs) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 73, in wrapper return func(*args, **kwargs) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/api/validation/__init__.py", line 73, in wrapper return func(*args, **kwargs) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/api/openstack/compute/servers.py", line 1108, in _action_create_image metadata) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/compute/api.py", line 140, in inner return f(self, context, instance, *args, **kw) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/compute/api.py", line 2389, in snapshot_volume_backed mapping=None) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/compute/api.py", line 2368, in snapshot_volume_backed self.compute_rpcapi.quiesce_instance(context, instance) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/nova/compute/rpcapi.py", line 1041, in quiesce_instance return cctxt.call(ctxt, 'quiesce_instance', instance=instance) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/transport.py", line 90, in _send timeout=timeout, retry=retry) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send retry=retry) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in _send result = self._waiter.wait(msg_id, timeout) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 342, in wait message = self.waiters.get(msg_id, timeout=timeout) File "/openstack/venvs/nova-13.3.7/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 244, in get 'to message ID %s' % msg_id) MessagingTimeout: Timed out waiting for a reply to message ID 70ee5f80284b4b68a289bf232b89325c To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1754360/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

