Re: [pve-devel] Volume live migration concurrency

Fabian Grünbichler Mon, 02 Jun 2025 00:33:42 -0700

> Andrei Perapiolkin <andrei.perepiol...@open-e.com> hat am 28.05.2025 16:49 
> CEST geschrieben:
> 
>  
> Hi Fabian,
> 
> Thank you for your time dedicated to this issue.
> 
> >> My current understanding is that all assets related to snapshots should 
> >> to be removed when volume is deactivation, is it correct?
> >> Or all volumes and snapshots expected to be present across the entire 
> >> cluster until they are explicitly deleted?
> 
> > I am not quite sure what you mean by "present" - do you mean "exist in an
> activated state"?
> 
> Exists in an active state - activated.
> 
> 
> >> How should the cleanup tasks be triggered across the remaining nodes?
> 
> > it should not be needed 
> 
> Consider following scenarios of live migration of a VM from 'node1' to 
> 'node2':
> 
> 1. Error occurs on 'node2' resulting in partial activation


if an error occurs on the target node during phase2 (after the VM has been
started), the target VM will be stopped and any local disks allocated as
part of the migration will be cleaned up as well. stopping the VM
includes deactiving all its volumes.

> 2. Error occurs on 'node1' resulting in partial deactivation

you mean an error right between deactivating volume 1 and 2, when
control has already been handed over to node 2?

> 3. Error occurs on both 'node1' and 'node2' resulting in dangling 
> artifacts remain on both 'node1' and 'node2'

incomplete or partial error handling is of course always possible - some
kind of errors are hard or impossible to recover from, after all.

> That might lead to partial activation(some artefacts might be created) 
> and partial deactivation(some artifacts might remain uncleared).
> Now, suppose the user unlocks the VM (if it was previously locked due to 
> the failure) and proceeds with another migration attempt, this time to 
> 'node3', hoping for success.
> What would happen to the artifacts on 'node1' and 'node2' in such a case?

those on node2 would be unaffected (the new migration task doesn't know 
about the previous one). so you might have orphaned disks there in case of
local storage, or still activated shared volumes in case of shared storage.

on node1 everything should be handled correctly.

> Regarding 'path' function
> 
> In my case it is difficult to deterministically predict actual path of 
> the device.
> Determining this path essentially requires activating the volume.
> This approach is questionable, as it implies calling activate_volume 
> without Proxmox being aware that the activation has occurred.
> What would happen if a failure occurs within Proxmox before it reaches 
> the stage of officially activating the volume?

we treat activating a volume as idempotent, so this should not cause any
damage, unless you activate volumes outside of a migration on nodes that
are not currently "owning" that guest. your storage plugin is allowed
to activate volumes internally if needed.

but given that path() is called quite often, you'd have to ensure that
activating a volume is not too expensive (usually some kind of fast path
that is effectively a nop if the volume has already been activated before
is used).

> Additionaly I believe that providing 'physical path' of the resource 
> that is not yet present(i.e. activated and usable) is a questionable 
> practice.
> This creates a risk, as there is always a temptation to use the path 
> directly, under the assumption that the resource is ready.

yes, but it has advantages as well:
- we don't have to carry the path through a call stack, but can just
  retrieve it where needed without the extra cost of doing another
  activation
- path also does other things/serves other purposes which don't require
  activation at all

> This approach assumes that all developers are fully aware that a given 
> $path might merely be a placeholder, and that additional activation is 
> required before use.
> The issue becomes even more complex in larger code base that integrate 
> third-party software—such as QEMU.
> 
> I might be mistaken, but during my experiments with the 'path' function, 
> I encountered an error where the virtualization system failed to open a 
> volume that had not been fully activated.
> Perhaps this has been addressed in newer versions, but previously, there 
> appeared to be a race condition between volume activation and QEMU 
> attempting to operate on the expected block device path.

bugs are always possible, if you can find more details about what happened
there I'd be happy to take a look.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] Volume live migration concurrency

Reply via email to