Public bug reported:

Nova's instance shutdown logic prematurely interprets the 'in shutdown'
state as 'shutdown successful', interfering with the graceful shutdown
process and potentially causing issues.

When a stop command is issued for VMs using PCI passthrough (e.g., GPUs), the 
shutdown process can take considerably longer than for traditional VMs
- 4 GPU, 1TB memory VM: ~ 1 minute 20 seconds for shutdown
- 8 GPU, 2TB memory VM: ~ 2 minutes 10 seconds for shutdown

The current issue is that Nova is interpreting the 'in shutdown' state (where 
the shutdown is still in progress) as 'shutdown successful' too early. This 
premature interpretation prevents the graceful shutdown logic from completing 
properly, potentially triggering destroy attempts before the shutdown process 
is fully complete. This can result in errors such as:
  " Cannot destroy instance, general system call failure: libvirt.libvirtError: 
Failed to terminate process 1910551 with SIGKILL: Device or resource busy "

This behavior prevents the effective use of the shutdown_timeout and 
os_shutdown_timeout settings, 
which are designed to allow for graceful shutdowns. 
By misinterpreting the 'in shutdown' state, Nova may initiate destroy 
operations too early, leading to potential data integrity issues and abnormal 
terminations.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2091147

Title:
  Nova prematurely interprets 'in shutdown' state as 'shutdown
  successful' for VMs with PCI passthrough devices, hindering graceful
  shutdown

Status in OpenStack Compute (nova):
  New

Bug description:
  Nova's instance shutdown logic prematurely interprets the 'in
  shutdown' state as 'shutdown successful', interfering with the
  graceful shutdown process and potentially causing issues.

  When a stop command is issued for VMs using PCI passthrough (e.g., GPUs), the 
shutdown process can take considerably longer than for traditional VMs
  - 4 GPU, 1TB memory VM: ~ 1 minute 20 seconds for shutdown
  - 8 GPU, 2TB memory VM: ~ 2 minutes 10 seconds for shutdown

  The current issue is that Nova is interpreting the 'in shutdown' state (where 
the shutdown is still in progress) as 'shutdown successful' too early. This 
premature interpretation prevents the graceful shutdown logic from completing 
properly, potentially triggering destroy attempts before the shutdown process 
is fully complete. This can result in errors such as:
    " Cannot destroy instance, general system call failure: 
libvirt.libvirtError: Failed to terminate process 1910551 with SIGKILL: Device 
or resource busy "

  This behavior prevents the effective use of the shutdown_timeout and 
os_shutdown_timeout settings, 
  which are designed to allow for graceful shutdowns. 
  By misinterpreting the 'in shutdown' state, Nova may initiate destroy 
operations too early, leading to potential data integrity issues and abnormal 
terminations.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2091147/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to