[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2046332

Title:
  When the host is restarted, nova-compute startup gets stuck

Status in OpenStack Compute (nova):
  Expired

Bug description:
  Description:
  After a host machine reboot, the nova-compute service fails to start up 
properly. Upon investigation, it was found that the nova-compute service gets 
stuck during the startup process, specifically during the createWithFlags call. 
This results in the nova-compute service remaining in a non-operational state.

  Environment:
  OpenStack version: Wallaby
  libvirt version: 6.0.0
  QEMU version: 4.2.0
  Backend storage: iSCSI
  Instance using iSCSI commercial storage

  Observations:
  The nova-compute.log shows that during the startup process, the 
createWithFlags call keeps blocking, preventing the nova-compute service from 
completing its startup routine.
  The libvirt.log indicates multiple warnings and errors related to the 
qemuDomainObjBeginJobInternal function, resulting in the inability to acquire 
state change locks.

  Expected Behavior:
  The nova-compute service should successfully start up after a host machine 
reboot, allowing it to manage instances without getting stuck during the 
startup process.

  Steps to Reproduce:
  Reboot the host machine.
  Constructing the createWithFlags interface has been blocking and not 
returning to the client because of communication exceptions with qemu.
  Monitor the nova-compute service startup process.

  Impact:
  The non-operational state of the nova-compute service affects the overall 
availability of instances and their management.

  Additional Information:
  The nova-compute service, during startup, initiates a series of operations 
including defineXML, createWithFlags, and lookupByUUIDString in order to bring 
up instances. However, the createWithFlags call remains stuck, leading to the 
service being unable to complete the startup process.

  Logs:
  nova-compute.log
  2023-12-04 14:13:43.582 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:43.582449438+08:00 stdout F 2023-12-04 14:13:43.582 32029 INFO 
os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - 
-] Scanning host 9 c: -, t: -, l: 10)^[[00m
  2023-12-04 14:13:43.664 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:43.66419754+08:00 stdout F 2023-12-04 14:13:43.663 32029 INFO 
os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - 
-] Searching for a device in session 2 and hctl ('3', '0', '0', 10) yield: 
sdn^[[00m
  2023-12-04 14:13:43.668 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:43.668415615+08:00 stdout F 2023-12-04 14:13:43.668 32029 INFO 
os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - 
-] Searching for a device in session 3 and hctl ['4', '*', '*', 10] yield: 
sdo^[[00m
  2023-12-04 14:13:43.741 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:43.74138962+08:00 stdout F 2023-12-04 14:13:43.740 32029 INFO 
os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - 
-] Searching for a device in session 1 and hctl ('2', '0', '0', 10) yield: 
sdp^[[00m
  2023-12-04 14:13:43.743 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:43.743081142+08:00 stdout F 2023-12-04 14:13:43.742 32029 INFO 
os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - 
-] Searching for a device in session 6 and hctl ['7', '*', '*', 10] yield: 
sdq^[[00m
  2023-12-04 14:13:43.744 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:43.744370001+08:00 stdout F 2023-12-04 14:13:43.744 32029 INFO 
os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - 
-] Searching for a device in session 8 and hctl ['9', '*', '*', 10] yield: 
sdr^[[00m
  2023-12-04 14:13:43.744 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:43.744783448+08:00 stdout F 2023-12-04 14:13:43.744 32029 INFO 
os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - 
-] Searching for a device in session 5 and hctl ('6', '0', '0', 10) yield: 
sdt^[[00m
  2023-12-04 14:13:43.745 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:43.745437018+08:00 stdout F 2023-12-04 14:13:43.745 32029 INFO 
os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - 
-] Searching for a device in session 7 and hctl ['8', '*', '*', 10] yield: 
sds^[[00m
  2023-12-04 14:13:44.226 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:44.226460766+08:00 stdout F 2023-12-04 14:13:44.225 32029 INFO 
os_brick.initiator.linuxscsi [req-717433ee-a291-45ca-81ee-05150953b496 - - - - 
-] Searching for a device in session 4 and hctl ['5', '*', '*', 10] yield: 
sdu^[[00m
  2023-12-04 14:13:44.482 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:44.482754517+08:00 stdout F 2023-12-04 14:13:44.482 32029 INFO 
os_brick.initiator.connectors.iscsi [req-717433ee-a291-45ca-81ee-05150953b496 - 
- - - -] All connection threads finished, giving 10 seconds for dm to 
appear.^[[00m
  2023-12-04 14:13:45.484 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:45.484445914+08:00 stdout F 2023-12-04 14:13:45.483 32029 INFO 
os_brick.initiator.connectors.iscsi [req-717433ee-a291-45ca-81ee-05150953b496 - 
- - - -] Result for connection is {'type': 'block', 'scsi_wwn': 
'36001405acbd43e79b732d343028329ed', 'path': '/dev/dm-3', 'multipath_id': 
'36001405acbd43e79b732d343028329ed'}.^[[00m
  2023-12-04 14:13:45.514 +0800 ¦ node-12 ¦ nova-compute-2plv4 ¦ nova-compute ¦ 
2023-12-04T14:13:45.514589463+08:00 stdout F 2023-12-04 14:13:45.514 32029 INFO 
os_vif [req-717433ee-a291-45ca-81ee-05150953b496 - - - - -] Successfully 
plugged vif 
VIFOpenVSwitch(active=True,address=fa:16:3e:e2:3c:03,bridge_name='br-int',has_traffic_filtering=True,id=ff32db4b-0a4a-4f89-bc17-eb261d55ef67,network=Network(d8955418-a63d-4f22-86cc-037e0cd7a291),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapff32db4b-0a')^[[00m
  2023-12-04 15:09:28.512 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 
2023-12-04T15:09:28.51286769+08:00 stderr F ++ hostname -s
  2023-12-04 15:09:28.513 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 
2023-12-04T15:09:28.5139163+08:00 stderr F + host_k8s=node-12
  2023-12-04 15:09:28.514 +0800 ¦ node-12 ¦ nova-compute-g7pf8 ¦ nova-compute ¦ 
2023-12-04T15:09:28.514304455+08:00 stderr F + grep hostha=enabled

  libvirt.log
  2023-12-04 14:13:35.707 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:13:35.707262365+08:00 stderr F 2023-12-04 06:13:35.707+0000: 
14541: info : hostname: node-12.domain.tld
  2023-12-04 14:13:35.707 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:13:35.707265536+08:00 stderr F 2023-12-04 06:13:35.707+0000: 
14541: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa 
CPU count; partial NUMA mapping is obsoleted and will be removed in future
  2023-12-04 14:13:35.709 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:13:35.709217034+08:00 stderr F 2023-12-04 06:13:35.709+0000: 
14540: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa 
CPU count; partial NUMA mapping is obsoleted and will be removed in future
  2023-12-04 14:13:45.528 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:13:45.528828631+08:00 stderr F 2023-12-04 06:13:45.528+0000: 
14543: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa 
CPU count; partial NUMA mapping is obsoleted and will be removed in future
  2023-12-04 14:13:45.531 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:13:45.531482963+08:00 stderr F 2023-12-04 06:13:45.531+0000: 
14544: warning : qemuDomainDefValidate:5966 : CPU topology doesn't match numa 
CPU count; partial NUMA mapping is obsoleted and will be removed in future
  2023-12-04 14:18:44.472 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:18:44.472148283+08:00 stderr F 2023-12-04 06:18:44.472+0000: 
14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, 
none, none) for domain instance-0000019c; current job is (async nested, none, 
start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 
remoteDispatchDomainCreateWithFlags (flags=0x0)) for (298s, 0s, 298s)
  2023-12-04 14:18:44.472 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:18:44.472176398+08:00 stderr F 2023-12-04 06:18:44.472+0000: 
14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: 
cannot acquire state change lock (held by 
monitor=remoteDispatchDomainCreateWithFlags)
  2023-12-04 14:19:14.490 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:19:14.490144202+08:00 stderr F 2023-12-04 06:19:14.490+0000: 
14540: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, 
none, none) for domain instance-0000019c; current job is (async nested, none, 
start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 
remoteDispatchDomainCreateWithFlags (flags=0x0)) for (328s, 0s, 328s)
  2023-12-04 14:19:14.490 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:19:14.490163367+08:00 stderr F 2023-12-04 06:19:14.490+0000: 
14540: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: 
cannot acquire state change lock (held by 
monitor=remoteDispatchDomainCreateWithFlags)
  2023-12-04 14:19:44.542 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:19:44.542135682+08:00 stderr F 2023-12-04 06:19:44.542+0000: 
14542: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, 
none, none) for domain instance-0000019c; current job is (async nested, none, 
start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 
remoteDispatchDomainCreateWithFlags (flags=0x0)) for (358s, 0s, 359s)
  2023-12-04 14:19:44.542 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:19:44.542158518+08:00 stderr F 2023-12-04 06:19:44.542+0000: 
14542: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: 
cannot acquire state change lock (held by 
monitor=remoteDispatchDomainCreateWithFlags)
  2023-12-04 14:20:14.590 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:20:14.590167336+08:00 stderr F 2023-12-04 06:20:14.590+0000: 
14541: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, 
none, none) for domain instance-0000019c; current job is (async nested, none, 
start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 
remoteDispatchDomainCreateWithFlags (flags=0x0)) for (388s, 0s, 389s)
  2023-12-04 14:20:14.590 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:20:14.590188481+08:00 stderr F 2023-12-04 06:20:14.590+0000: 
14541: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: 
cannot acquire state change lock (held by 
monitor=remoteDispatchDomainCreateWithFlags)
  2023-12-04 14:20:44.628 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:20:44.62812621+08:00 stderr F 2023-12-04 06:20:44.628+0000: 
14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, 
none, none) for domain instance-0000019c; current job is (async nested, none, 
start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 
remoteDispatchDomainCreateWithFlags (flags=0x0)) for (419s, 0s, 419s)
  2023-12-04 14:20:44.628 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T14:20:44.628149564+08:00 stderr F 2023-12-04 06:20:44.628+0000: 
14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: 
cannot acquire state change lock (held by 
monitor=remoteDispatchDomainCreateWithFlags)
  2023-12-04 15:09:01.199 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T15:09:01.199162919+08:00 stderr F 2023-12-04 07:09:01.199+0000: 
14543: warning : qemuDomainObjBeginJobInternal:9905 : Cannot start job (query, 
none, none) for domain instance-0000019c; current job is (async nested, none, 
start) owned by (14544 remoteDispatchDomainCreateWithFlags, 0 <null>, 14544 
remoteDispatchDomainCreateWithFlags (flags=0x0)) for (3315s, 0s, 3315s)
  ......
  2023-12-04 15:09:01.199 +0800 ¦ node-12 ¦ libvirt-lbr6f ¦ libvirt ¦ 
2023-12-04T15:09:01.199191359+08:00 stderr F 2023-12-04 07:09:01.199+0000: 
14543: error : qemuDomainObjBeginJobInternal:9927 : Timed out during operation: 
cannot acquire state change lock (held by 
monitor=remoteDispatchDomainCreateWithFlags)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2046332/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to