Am 01.04.25 um 11:52 schrieb Fabian Grünbichler: >> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am >> 24.03.2025 12:15 CET geschrieben: >> verify that node is dead from corosync && ssh >> and move config file from /etc/pve directly > there are two reasons why this is dangerous and why we haven't exposed > anything like this in the API or UI.. > > the first one is the risk of corruption - just because a (supposedly dead) > node X is not reachable from your local node A doesn't mean it isn't still > running. if it is still running, any guests that were already started before > might still be running as well. any guests still running might still be able > to talk to shared storage. unless there are other safeguards in place (like > MMP, which is not a given for all storages), this can easily completely > corrupt guest volumes if you attempt to recover and start such a guest. HA > protects against this - node X will fence itself before node A will attempt > recovery, so there is never a situation where both nodes will try to write to > the same volume. just checking whether other cluster nodes can still connect > to node X is not enough by any stretch to make this safe. > > the second one is ownership of a VM/CT - PVE relies on node-local locking of > guests to avoid contention. this only works because each guest/VMID has a > clear owner - the node where the config is currently on. if you steal a > config by moving it, you are violating this assumption. we only change the > owner of a VMID in two scenarios with careful consideration of the > implications: > - when doing a migration, which is initiated by the source node that is > currently owning the guest, so it willingly hands over control to the new > node which is safe by definition (no stealing involved and proper locking in > place) > - when doing a HA recovery, which is protected by the HA locks and the > watchdog - we know that the original node has been fenced before the recovery > happens and we know it cannot do anything with the guest before it has been > informed about the recovery (this is ensured by the design of the HA locks). > your code below is not protected by the HA stack, so there is a race involved > - your node where the "deadnode migration" is initiated cannot lock the VMID > in a way that the supposedly "dead" node knows about (config locking for > guests is node-local, so it can only happen on the node that "owns" the > config, anything else doesn't make sense/doesn't protect anything). if the > "dead" node rejoins the cluster at the right moment, it still owns the > VMID/config and can start it, while the other node thinks it can still steal > it. there's also no protection against initiating multiple deadnode > migrations in parallel for the same VMID, although of course all but one will > fail because pmxcfs ensures the VMID.conf only exists under a single node. > we'd need to give up node-local guest locking to close this gap, which is a > no-go for performance reasons. > > I understand that this would be convenient to expose, but it is also really > dangerous without understanding the implications - and once there is an > option to trigger it via the UI, no matter how many disclaimers you put on > it, people will press that button and mess up and blame PVE. at the same time > there is an actual implementation that safely implements it - it's called HA > 😉 so I'd rather spend some time focusing on improving the robustness of our > HA stack, rather than adding such a footgun. >
+1 to all of the above. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel