Public bug reported: This bug describes a race condition during deletion of a bare metal (Ironic) instance when automated cleaning is enabled in Ironic (which is the default).
# Steps to reproduce As a race condition, this one is not easy to reproduce, although it has been seen in the wild on several occasions. The easiest way to reproduce it is in a development environment (e.g. devstack), tweaking some configuration and adding some sleeps to particular operations in nova and ironic. nova.conf configuration: Reduce retries for ironic API polling. Needs to be long enough to allow operations to complete. [ironic] api_max_retries = 10 sleep #1: nova-compute waits for the node to move to cleaning (nova/virt/ironic/driver.py): try: if node.provision_state in _UNPROVISION_STATES: self._unprovision(instance, node) else: # NOTE(hshiina): if spawn() fails before ironic starts # provisioning, instance information should be # removed from ironic node. self._remove_instance_info_from_node(node, instance) finally: ##### HERE ####### time.sleep(20) ##### HERE ####### # NOTE(mgoddard): We don't need to remove instance info at this # point since we will have already done it. The destroy will only # succeed if this method returns without error, so we will end up # removing the instance info eventually. self._cleanup_deploy(node, instance, network_info, remove_instance_info=False) sleep #2: ironic conductor holds onto the node lock: @task_manager.require_exclusive_lock def do_node_clean(task, clean_steps=None, disable_ramdisk=False): """Internal RPC method to perform cleaning of a node. :param task: a TaskManager instance with an exclusive lock on its node :param clean_steps: For a manual clean, the list of clean steps to perform. Is None For automated cleaning (default). For more information, see the clean_steps parameter of :func:`ConductorManager.do_node_clean`. :param disable_ramdisk: Whether to skip booting ramdisk for cleaning. """ ##### HERE ##### import time time.sleep(120) ##### HERE ##### node = task.node manual_clean = clean_steps is not None Next, create, then destroy a bare metal instance. # Expected results The node is deleted # Actual results The node moves to an ERROR state. Logs from nova-compute: 2023-05-17 14:30:30.493 7 ERROR ironicclient.common.http [req-c8d567dc-7088-4bad-9a09-6ba33b6c7b4d 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] Error contacting Ir onic server: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409). Attempt 6 of 6: ironicclient.common.apic lient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server [req-c8d567dc-7088-4bad-9a09-6ba33b6c7b4d 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] Exception during m essage handling: ironicclient.common.apiclient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is complete d. (HTTP 409) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/exception_wrapper.py", line 72, in wrapped 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server context, exc, binary) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.force_reraise() 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server raise self.value 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/exception_wrapper.py", line 63, in wrapped 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 10693, in external_instance_event 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server event.tag) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 10498, in _process_instance_vif_deleted_event 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.driver.detach_interface(context, instance, vif) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1630, in detach_interface 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.unplug_vifs(instance, [vif]) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1591, in unplug_vifs 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self._unplug_vifs(node, instance, network_info) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1558, in _unplug_vifs 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server port_id) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/client_wrapper.py", line 180, in call 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._multi_getattr(client, method)(*args, **kwargs) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/v1/node.py", line 512, in vif_detach 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server global_request_id=global_request_id) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/v1/node.py", line 408, in delete 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server global_request_id=global_request_id) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/base.py", line 292, in _delete 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server headers=headers) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 426, in raw_request 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._http_request(url, method, **kwargs) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 287, in wrapper 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return func(self, url, method, **kwargs) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 387, in _http_request 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server error_json.get('debuginfo'), method, url) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server ironicclient.common.apiclient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409) # Analysis This issue is similar to https://bugs.launchpad.net/nova/+bug/1815799, where instance_uuid and instance_info were being cleaned up. Ironic deletes VIFs during tear down since Rocky [1], so we should also be safe to skip this in the case where the node has been deprovisioned. Debug logs confirm that the VIF cleanup is a noop: 2023-05-17 14:24:11.791 7 DEBUG nova.virt.ironic.driver [req-e419f330-fe14-4c4b-8d5a-b7a40bb77651 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] VIF a5c1622b-eb13-43 d3-a46d-aa5781a85cdb isn't attached to Ironic node 3152ff2d-a0cb-4e7a-bcc7-81266106fef2 _unplug_vifs /var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py:1561 [1] https://review.opendev.org/q/I8d683d2d506c97535b5a8f9a5de4c070c7e887df ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2019977 Title: Ironic VIF unplug in destroy may fail due to node locked by cleaning Status in OpenStack Compute (nova): New Bug description: This bug describes a race condition during deletion of a bare metal (Ironic) instance when automated cleaning is enabled in Ironic (which is the default). # Steps to reproduce As a race condition, this one is not easy to reproduce, although it has been seen in the wild on several occasions. The easiest way to reproduce it is in a development environment (e.g. devstack), tweaking some configuration and adding some sleeps to particular operations in nova and ironic. nova.conf configuration: Reduce retries for ironic API polling. Needs to be long enough to allow operations to complete. [ironic] api_max_retries = 10 sleep #1: nova-compute waits for the node to move to cleaning (nova/virt/ironic/driver.py): try: if node.provision_state in _UNPROVISION_STATES: self._unprovision(instance, node) else: # NOTE(hshiina): if spawn() fails before ironic starts # provisioning, instance information should be # removed from ironic node. self._remove_instance_info_from_node(node, instance) finally: ##### HERE ####### time.sleep(20) ##### HERE ####### # NOTE(mgoddard): We don't need to remove instance info at this # point since we will have already done it. The destroy will only # succeed if this method returns without error, so we will end up # removing the instance info eventually. self._cleanup_deploy(node, instance, network_info, remove_instance_info=False) sleep #2: ironic conductor holds onto the node lock: @task_manager.require_exclusive_lock def do_node_clean(task, clean_steps=None, disable_ramdisk=False): """Internal RPC method to perform cleaning of a node. :param task: a TaskManager instance with an exclusive lock on its node :param clean_steps: For a manual clean, the list of clean steps to perform. Is None For automated cleaning (default). For more information, see the clean_steps parameter of :func:`ConductorManager.do_node_clean`. :param disable_ramdisk: Whether to skip booting ramdisk for cleaning. """ ##### HERE ##### import time time.sleep(120) ##### HERE ##### node = task.node manual_clean = clean_steps is not None Next, create, then destroy a bare metal instance. # Expected results The node is deleted # Actual results The node moves to an ERROR state. Logs from nova-compute: 2023-05-17 14:30:30.493 7 ERROR ironicclient.common.http [req-c8d567dc-7088-4bad-9a09-6ba33b6c7b4d 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] Error contacting Ir onic server: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409). Attempt 6 of 6: ironicclient.common.apic lient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server [req-c8d567dc-7088-4bad-9a09-6ba33b6c7b4d 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] Exception during m essage handling: ironicclient.common.apiclient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is complete d. (HTTP 409) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last): 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/exception_wrapper.py", line 72, in wrapped 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server context, exc, binary) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.force_reraise() 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server raise self.value 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/exception_wrapper.py", line 63, in wrapped 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 10693, in external_instance_event 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server event.tag) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/compute/manager.py", line 10498, in _process_instance_vif_deleted_event 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.driver.detach_interface(context, instance, vif) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1630, in detach_interface 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self.unplug_vifs(instance, [vif]) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1591, in unplug_vifs 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server self._unplug_vifs(node, instance, network_info) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py", line 1558, in _unplug_vifs 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server port_id) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/client_wrapper.py", line 180, in call 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._multi_getattr(client, method)(*args, **kwargs) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/v1/node.py", line 512, in vif_detach 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server global_request_id=global_request_id) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/v1/node.py", line 408, in delete 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server global_request_id=global_request_id) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/base.py", line 292, in _delete 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server headers=headers) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 426, in raw_request 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return self._http_request(url, method, **kwargs) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 287, in wrapper 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server return func(self, url, method, **kwargs) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python3.6/site-packages/ironicclient/common/http.py", line 387, in _http_request 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server error_json.get('debuginfo'), method, url) 2023-05-17 14:30:30.496 7 ERROR oslo_messaging.rpc.server ironicclient.common.apiclient.exceptions.Conflict: Node 2666c347-ff75-4c04-9059-9b890ad14b08 is locked by host mark-ironic.novalocal, please retry after the current operation is completed. (HTTP 409) # Analysis This issue is similar to https://bugs.launchpad.net/nova/+bug/1815799, where instance_uuid and instance_info were being cleaned up. Ironic deletes VIFs during tear down since Rocky [1], so we should also be safe to skip this in the case where the node has been deprovisioned. Debug logs confirm that the VIF cleanup is a noop: 2023-05-17 14:24:11.791 7 DEBUG nova.virt.ironic.driver [req-e419f330-fe14-4c4b-8d5a-b7a40bb77651 5ef38591b78a42928e8dfbdbaa90acd9 60d7786e349549eeb5a309b96251c789 - default default] VIF a5c1622b-eb13-43 d3-a46d-aa5781a85cdb isn't attached to Ironic node 3152ff2d-a0cb-4e7a-bcc7-81266106fef2 _unplug_vifs /var/lib/kolla/venv/lib/python3.6/site-packages/nova/virt/ironic/driver.py:1561 [1] https://review.opendev.org/q/I8d683d2d506c97535b5a8f9a5de4c070c7e887df To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2019977/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp