Re: [pve-devel] [PATCH qemu-server 1/1] qemu: add offline migration from dead node

Thomas Lamprecht Tue, 01 Apr 2025 05:55:21 -0700

Am 01.04.25 um 13:37 schrieb Dominik Csapak:
> Mhmm, what I meant here is that instructing the user to manually
> do 'mv some-path some-other-path' has more error potential (e.g.
> typos, misremembering nodenames/vmids/etc.) than e.g. clicking
> the vm on the offline node and pressing a button (or
> following a CLI tool output/options)


Which all have their error potential too, especially with hostnames
being free-form and not exclusive.

> I mentioned it because fabian wrote we could maybe solve it with a
> cluster wide VM lock, I think restricting the moving to such a lock
> could work, under the assumption that the admin makes sure the offline
> node is and stays offline. (Which he has to do anyway)

Still not sure what this would provide, pmxcfs gurantees that the VMID
config can exist only once already anyway, so only one node can do a
move and such moves can only happen if they would be equal to a file
rename as any resource must be shared already to make this work.
Well replication could be fixed up I guess, but that can be handled on
VM start too. Cannot think of anything else (without an in-depth
evaluation though) that an API can/should do different for the actual
move itself. Doing some up-front checks is a different story, but that
could also result in a false sense of safety.

> It still improves the UX for that situation since it's then a
> provided/guided way vs. mv'ing files on the filesystem.

I'd not touch the move part though, at least for starters, just like the
upgrade checker scripts it should only assist.

> Just to clarify, I'm not for blindly implementing such an API call/CLI 
> tool/etc.
> but wanted to argue that we probably want to improve the UX of that situation
> as good as we can and offered my thoughts on how we could do it.
 
That's certainly fine; having it improved would be good, but I'm very wary
of hot takes and hand waving (not meaning you here, just in general), this
isn't a purge/remove/wipe of some resource on a working system, like wiping
disks or removing guests, as that can present the information to the admin
from a known good node that manages its state itself.
An unknown/dead node is literally breaking core clustering assumption that
we build upon on a lot of places, IMO a very different thing. Mentioning this
as it might be easy to question why other destructive actions are exposed in
the UI.

And FWIW, if I should reconsider this it would be much easier to argue for
further integration if the basic assistant/checker guide/tool already
existed for some time and was somewhat battle tested, as that would allow a
much more confident evaluation of options, whatever those then look like;
some "scary" hint in the UI with lots of exclamation marks does not cut it
for me though, no offense to anybody.


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [PATCH qemu-server 1/1] qemu: add offline migration from dead node

Reply via email to