[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]
** Changed in: nova Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2046332 Title: When the host is restarted, nova-compute startup gets stuck Status in OpenStack Compute (nova): Expired Bug description: Description: After a host machine reboot, the nova-compute service fails to start up properly. Upon investigation, it was found that the nova-compute service gets stuck during the startup process, specifically during the createWithFlags call. This results in the nova-compute service remaining in a non-operational state. Environment: OpenStack version: Wallaby libvirt version: 6.0.0 QEMU version: 4.2.0 Backend storage: iSCSI Instance using iSCSI commercial storage Observations: The nova-compute.log shows that during the startup process, the createWithFlags call keeps blocking, preventing the nova-compute service from completing its startup routine. The libvirt.log indicates multiple warnings and errors related to the qemuDomainObjBeginJobInternal function, resulting in the inability to acquire state change locks. Expected Behavior: The nova-compute service should successfully start up after a host machine reboot, allowing it to manage instances without getting stuck during the startup process. Steps to Reproduce: Reboot the host machine. Constructing the createWithFlags interface has been blocking and not returning to the client because of communication exceptions with qemu. Monitor the nova-compute service startup process. Impact: The non-operational state of the nova-compute service affects the overall availability of instances and their management. Additional Information: The nova-compute service, during startup, initiates a series of operations including defineXML, createWithFlags, and lookupByUUIDString in order to bring up instances. However, the createWithFlags call remains stuck, leading to the service being unable to complete the startup process. Logs: nova-compute.log 2023-12-04 14:13:43.582 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.582449438+08:00 stdout F 2023-12-04 14:13:43.582 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Scanning host 9 c: -, t: -, l: 10)^[[00m 2023-12-04 14:13:43.664 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.66419754+08:00 stdout F 2023-12-04 14:13:43.663 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 2 and hctl ('3', '0', '0', 10) yield: sdn^[[00m 2023-12-04 14:13:43.668 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.668415615+08:00 stdout F 2023-12-04 14:13:43.668 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 3 and hctl ['4', '*', '*', 10] yield: sdo^[[00m 2023-12-04 14:13:43.741 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.74138962+08:00 stdout F 2023-12-04 14:13:43.740 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 1 and hctl ('2', '0', '0', 10) yield: sdp^[[00m 2023-12-04 14:13:43.743 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.743081142+08:00 stdout F 2023-12-04 14:13:43.742 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 6 and hctl ['7', '*', '*', 10] yield: sdq^[[00m 2023-12-04 14:13:43.744 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.744370001+08:00 stdout F 2023-12-04 14:13:43.744 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 8 and hctl ['9', '*', '*', 10] yield: sdr^[[00m 2023-12-04 14:13:43.744 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.744783448+08:00 stdout F 2023-12-04 14:13:43.744 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 5 and hctl ('6', '0', '0', 10) yield: sdt^[[00m 2023-12-04 14:13:43.745 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:43.745437018+08:00 stdout F 2023-12-04 14:13:43.745 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 7 and hctl ['8', '*', '*', 10] yield: sds^[[00m 2023-12-04 14:13:44.226 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:44.226460766+08:00 stdout F 2023-12-04 14:13:44.225 32029 INFO os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Searching for a device in session 4 and hctl ['5', '*', '*', 10] yield: sdu^[[00m 2023-12-04 14:13:44.482 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:44.482754517+08:00 stdout F 2023-12-04 14:13:44.482 32029 INFO os_brick.initiator.connectors.iscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] All connection threads finished, giving 10 seconds for dm to appear.^[[00m 2023-12-04 14:13:45.484 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:45.484445914+08:00 stdout F 2023-12-04 14:13:45.483 32029 INFO os_brick.initiator.connectors.iscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Result for connection is {'type': 'block', 'scsi_wwn': '36001405acbd43e79b732d343028329ed', 'path': '/dev/dm-3', 'multipath_id': '36001405acbd43e79b732d343028329ed'}.^[[00m 2023-12-04 14:13:45.514 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 2023-12-04T14:13:45.514589463+08:00 stdout F 2023-12-04 14:13:45.514 32029 INFO os_vif [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Successfully plugged vif VIFOpenVSwitch(active=True,address=fa:16:3e:e2:3c:03,bridge_name='br-int',has_traffic_filtering=True,id=ff32db4b-0a4a-4f89-bc17-eb261d55ef67,network=Network(d8955418-a63d-4f22-86cc-037e0cd7a291),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapff32db4b-0a')^[[00m 2023-12-04 15:09:28.512 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 2023-12-04T15:09:28.51286769+08:00 stderr F ++ hostname -s 2023-12-04 15:09:28.513 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 2023-12-04T15:09:28.5139163+08:00 stderr F + host_k8s=node-12 2023-12-04 15:09:28.514 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 2023-12-04T15:09:28.514304455+08:00 stderr F + grep hostha=enabled libvirt.log 2023-12-04 14:13:35.707 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:35.707262365+08:00 stderr F 2023-12-04 06:13:35.707+0000: 14541: info : hostname: node-12.domain.tld 2023-12-04 14:13:35.707 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:35.707265536+08:00 stderr F 2023-12-04 06:13:35.707+0000: 14541: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future 2023-12-04 14:13:35.709 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:35.709217034+08:00 stderr F 2023-12-04 06:13:35.709+0000: 14540: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future 2023-12-04 14:13:45.528 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:45.528828631+08:00 stderr F 2023-12-04 06:13:45.528+0000: 14543: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future 2023-12-04 14:13:45.531 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:13:45.531482963+08:00 stderr F 2023-12-04 06:13:45.531+0000: 14544: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa CPU count; partial NUMA mapping is obsoleted and will be removed in future 2023-12-04 14:18:44.472 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:18:44.472148283+08:00 stderr F 2023-12-04 06:18:44.472+0000: 14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (298s, 0s, 298s) 2023-12-04 14:18:44.472 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:18:44.472176398+08:00 stderr F 2023-12-04 06:18:44.472+0000: 14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags) 2023-12-04 14:19:14.490 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:14.490144202+08:00 stderr F 2023-12-04 06:19:14.490+0000: 14540: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (328s, 0s, 328s) 2023-12-04 14:19:14.490 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:14.490163367+08:00 stderr F 2023-12-04 06:19:14.490+0000: 14540: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags) 2023-12-04 14:19:44.542 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:44.542135682+08:00 stderr F 2023-12-04 06:19:44.542+0000: 14542: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (358s, 0s, 359s) 2023-12-04 14:19:44.542 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:19:44.542158518+08:00 stderr F 2023-12-04 06:19:44.542+0000: 14542: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags) 2023-12-04 14:20:14.590 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:14.590167336+08:00 stderr F 2023-12-04 06:20:14.590+0000: 14541: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (388s, 0s, 389s) 2023-12-04 14:20:14.590 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:14.590188481+08:00 stderr F 2023-12-04 06:20:14.590+0000: 14541: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags) 2023-12-04 14:20:44.628 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:44.62812621+08:00 stderr F 2023-12-04 06:20:44.628+0000: 14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (419s, 0s, 419s) 2023-12-04 14:20:44.628 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T14:20:44.628149564+08:00 stderr F 2023-12-04 06:20:44.628+0000: 14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags) 2023-12-04 15:09:01.199 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T15:09:01.199162919+08:00 stderr F 2023-12-04 07:09:01.199+0000: 14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, none, none) for domain instance-0000019c; current job is (async nested, none, start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 remoteDispatchDomainCreateWithFlags (flags=0x0)) for (3315s, 0s, 3315s) ...... 2023-12-04 15:09:01.199 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 2023-12-04T15:09:01.199191359+08:00 stderr F 2023-12-04 07:09:01.199+0000: 14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreateWithFlags) To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2046332/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp