Re: [PATCH qemu-server v2] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds

Fiona Ebner Mon, 16 Feb 2026 01:16:25 -0800

Am 16.02.26 um 9:42 AM schrieb Fabian Grünbichler:
> On February 13, 2026 2:16 pm, Fiona Ebner wrote:
>> Am 13.02.26 um 1:20 PM schrieb Fabian Grünbichler:
>>> On February 13, 2026 1:14 pm, Fiona Ebner wrote:
>>>> Am 10.02.26 um 12:14 PM schrieb Dominik Csapak:
>>>>> +                my $timeout = 30;
>>>>> +                my $starttime = time();
>>>>>                  my $pid = PVE::QemuServer::check_running($vmid);
>>>>> -                die "vm still running\n" if $pid;
>>>>> +                warn "vm still running - waiting up to $timeout 
>>>>> seconds\n" if $pid;
>>>>
>>>> While we're at it, we could improve the message here. Something like
>>>> 'QEMU process $pid for VM $vmid still running (or newly started)'
>>>> Having the PID is nice info for developers/support engineers and the
>>>> case where a new instance is started before the cleanup was done is also
>>>> possible.
>>>>
>>>> In fact, the case with the new instance is easily triggered by 'stop'
>>>> mode backups. Maybe we should fix that up first before adding a timeout
>>>> here?
>>>>
>>>> Feb 13 13:09:48 pve9a1 qm[92975]: <root@pam> end task
>>>> UPID:pve9a1:00016B30:000CDF80:698F1485:qmshutdown:102:root@pam: OK
>>>> Feb 13 13:09:48 pve9a1 systemd[1]: Started 102.scope.
>>>> Feb 13 13:09:48 pve9a1 qmeventd[93079]: Starting cleanup for 102
>>>> Feb 13 13:09:48 pve9a1 qmeventd[93079]: trying to acquire lock...
>>>> Feb 13 13:09:48 pve9a1 vzdump[92895]: VM 102 started with PID 93116.
>>>> Feb 13 13:09:48 pve9a1 qmeventd[93079]:  OK
>>>> Feb 13 13:09:48 pve9a1 qmeventd[93079]: vm still running
>>>
>>> does this mean we should actually have some sort of mechanism similar to
>>> the reboot flag to indicate a pending cleanup, and block/delay starts if
>>> it is still set?
>>
>> Blocking/delaying starts is not what happens for the reboot flag/file:
> 
> that's not what I meant, the similarity was just "have a flag", not
> "have a flag that behaves identical" ;)
> 
> my proposal was:
> - add a flag that indicates cleanup is pending (similar to reboot is
>   pending)
> - *handle that flag* in the start flow to wait for the cleanup to be
>   done before starting


Shouldn't we change the reboot flag to also do this?

>>> Feb 13 14:00:16 pve9a1 qm[124470]: <root@pam> starting task 
>>> UPID:pve9a1:0001E639:001180FE:698F2060:qmreboot:102:root@pam:
>>> Feb 13 14:00:16 pve9a1 qm[124472]: <root@pam> starting task 
>>> UPID:pve9a1:0001E63A:0011811E:698F2060:qmstart:102:root@pam:
>>> Feb 13 14:00:16 pve9a1 qm[124474]: start VM 102: 
>>> UPID:pve9a1:0001E63A:0011811E:698F2060:qmstart:102:root@pam:
>>> [...]
>>> Feb 13 14:00:22 pve9a1 systemd[1]: 102.scope: Deactivated successfully.
>>> Feb 13 14:00:22 pve9a1 systemd[1]: 102.scope: Consumed 2min 3.333s CPU 
>>> time, 2G memory peak.
>>> Feb 13 14:00:23 pve9a1 qmeventd[124565]: Starting cleanup for 102
>>> Feb 13 14:00:23 pve9a1 qmeventd[124565]: trying to acquire lock...
>>> Feb 13 14:00:23 pve9a1 qm[124470]: <root@pam> end task 
>>> UPID:pve9a1:0001E639:001180FE:698F2060:qmreboot:102:root@pam: OK
>>> Feb 13 14:00:23 pve9a1 systemd[1]: Started 102.scope.
>>> Feb 13 14:00:23 pve9a1 qm[124474]: VM 102 started with PID 124620.
>>> Feb 13 14:00:23 pve9a1 qmeventd[124565]:  OK
>>> Feb 13 14:00:23 pve9a1 qmeventd[124565]: vm still running
>>
>> Currently, it's just indicating whether the cleanup handler should start
>> the VM again afterwards.
>>
>> Am 13.02.26 um 1:22 PM schrieb Dominik Csapak:
>>> Sounds good, one possibility would be to do no cleanup at all when doing
>>> a stop mode backup?
>>> We already know we'll need the resources (pid/socket/etc. files, vgpus,...) 
>>> again?
>>>
>>> Or is there some situation where that might not be the case? 
>>
>> We do it for reboot (if not another start task sneaks in like in my
>> example above), and I don't see a good reason from the top of my head
>> why 'stop' mode backup should behave differently from a reboot (for
>> running VMs). It even applies pending changes just like a reboot right now.
> 
> but what about external callers doing something like:
> 
> - stop
> - do whatever
> - start
> 
> in rapid (automated) succession? those would still (possibly) trigger
> cleanup after "doing whatever" and starting the VM again already? and in
> particular if we skip cleanup for "our" cases of stop;start it will be
> easy to introduce sideeffects in cleanup that break such usage?

I did not argue for skipping cleanup. I argued for being consistent with
reboot where we (try to) do cleanup. I just wasn't sure it's really needed.

>> I'm not sure if there is an actual need to do cleanup or if we could

I guess the actual need is to have more consistent behavior.

>> also skip it when we are planning to spin up another instance right
>> away. But we do it for reboot, so the "safe" variant is also doing it
>> for 'stop' mode backup. History tells me it's been there since the
>> reboot functionality was added:
>> https://lists.proxmox.com/pipermail/pve-devel/2019-September/038988.html

Re: [PATCH qemu-server v2] fix #7119: qm cleanup: wait for process exiting for up to 30 seconds

Reply via email to