> Denis Kanchev <denis.kanc...@storpool.com> hat am 02.06.2025 11:18 CEST 
> geschrieben:
> 
> 
> My bad :) in terms of Proxmox it must be hand-overing the storage control - 
> the storage plugin function activate_volume() is called in our case, which 
> moves the storage to the new VM.
> So no data is moved across the nodes and only the volumes get re-attached.
> Thanks for the plentiful information

okay!

so you basically special case this "volume is active on two nodes" case which 
should only happen during a live migration, and that somehow runs into an issue 
if the migration is aborted because there is some suspected race somewhere?

as part of a live migration, the sequence should be:

node A: migration starts
node A: start request for target VM on node B (over SSH)
node B: `qm start ..` is called
node B: qm start will activate volumes
node B: qm start returns
node A: migration starts
node A/B: some fatal error
node A: cancel migration (via QMP/the source VM running on node A)
node A: request to stop target VM on node B (over SSH)
node B: `qm stop ..` called
node B: qm stop will deactivate volumes

I am not sure where another activate_volume call after node A has started the 
migration could happen? at that point, node A still has control over the VM 
(ID), so nothing in PVE should operate on it other than the selective calls 
made as part of the migration, which are basically only querying migration 
status and error handling at that point..

it would still be good to know what actually got OOM-killed in your case.. was 
it the `qm start`? was it the `kvm` process itself? something entirely else?

if you can reproduce the issue, you could also add logging in activate_volume 
to find out the exact call path (e.g., log the call stack somewhere), maybe 
that helps find the exact scenario that you are seeing..


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to