On February 13, 2026 1:14 pm, Fiona Ebner wrote: > Am 10.02.26 um 12:14 PM schrieb Dominik Csapak: >> When qmeventd detects a vm exiting, it starts 'qm cleanup' to cleanup >> files, executing hookscripts, etc. >> >> Since the vm process exits is sometimes not instant, wait up to 30 >> seconds here to start the cleanup process instead of immediately >> aborting if the pid still exits. This prevented executing the hookscript >> on the 'post-stop' phase. >> >> This can be easily reproduced by e.g. passing through a usb device, >> which delays the qemu process exit for a few seconds. >> >> Signed-off-by: Dominik Csapak <[email protected]> >> --- >> changes from v1: >> * use correct while condition (time() is always >= $starttime) >> >> original comment: >> >> The 30 second timeout was arbitrarily chosen, but we could probably >> start with something smaller, like 10 seconds? Could be adapted on >> applying though. >> >> In my (short) tests the usb passthrough part only adds a single second, >> but i can imagine different devices on other systems could block it for >> much longer. >> >> src/PVE/CLI/qm.pm | 13 ++++++++++++- >> 1 file changed, 12 insertions(+), 1 deletion(-) >> >> diff --git a/src/PVE/CLI/qm.pm b/src/PVE/CLI/qm.pm >> index bdae9641..16875ed2 100755 >> --- a/src/PVE/CLI/qm.pm >> +++ b/src/PVE/CLI/qm.pm >> @@ -1101,8 +1101,19 @@ __PACKAGE__->register_method({ >> 60, >> sub { >> my $conf = PVE::QemuConfig->load_config($vmid); >> + >> + # wait for some timeout until vm process exits, since this >> might not be instant > > s/timeout/time/ > > Nit: s/vm/the QEMU/ > > Maybe add "after the QMP 'SHUTDOWN' event"? > >> + my $timeout = 30; >> + my $starttime = time(); >> my $pid = PVE::QemuServer::check_running($vmid); >> - die "vm still running\n" if $pid; >> + warn "vm still running - waiting up to $timeout seconds\n" >> if $pid; > > While we're at it, we could improve the message here. Something like > 'QEMU process $pid for VM $vmid still running (or newly started)' > Having the PID is nice info for developers/support engineers and the > case where a new instance is started before the cleanup was done is also > possible. > > In fact, the case with the new instance is easily triggered by 'stop' > mode backups. Maybe we should fix that up first before adding a timeout > here? > > Feb 13 13:09:48 pve9a1 qm[92975]: <root@pam> end task > UPID:pve9a1:00016B30:000CDF80:698F1485:qmshutdown:102:root@pam: OK > Feb 13 13:09:48 pve9a1 systemd[1]: Started 102.scope. > Feb 13 13:09:48 pve9a1 qmeventd[93079]: Starting cleanup for 102 > Feb 13 13:09:48 pve9a1 qmeventd[93079]: trying to acquire lock... > Feb 13 13:09:48 pve9a1 vzdump[92895]: VM 102 started with PID 93116. > Feb 13 13:09:48 pve9a1 qmeventd[93079]: OK > Feb 13 13:09:48 pve9a1 qmeventd[93079]: vm still running
does this mean we should actually have some sort of mechanism similar to the reboot flag to indicate a pending cleanup, and block/delay starts if it is still set?
