--- Begin Message ---
Hello,

On 14/02/2025 13:42, Fabian Grünbichler wrote:
AFAICT from the description above (not looking at code or actually testing anything), issues on 
your storage layer should be ruled out. But it still leaves issues with anything else, e.g. any 
long running task (either by PVE, or by the admin) that involves a HA-managed guest is at risk of 
being "split-brained". In a regular (HA) setup, another node will only recover the config 
(and thus ownership) of the guest once the requisite timeouts have passed, which means it *knows* 
the failed node must have fenced itself. In your setup, this is not the case anymore - the 
non-quorate node still has the VM config (since it is not quorate, it cannot notice the 
"theft" of the config by the HA stack running on the quorate partition of the cluster) 
and thus (from a local point of view) at least RO ownership of that guest. Depending on the 
sequence of events, such a task might have passed a quorum check earlier and not yet reached the 
next such check, and thus even think it still has full ownership and act accordingly! Obviously, 
writes to your shared storage or to /etc/pve would be blocked, but that doesn't mean that nothing 
dangerous can happen (e.g., local or external state being corrupted or running out of sync by 
writes on/from two different nodes).

The only way to make this safe(r) would be to basically disallow any custom integration 
(to ensure no non-PVE tasks are running) and kill the whole PVE stack on quorum loss, 
including any spawned tasks and pmxcfs. At that point, all the configs and API would 
become unavailable as well, so the risk of something/somebody misinterpreting anything 
should become zero - if there is no information, nothing can be misinterpreted after all 
;) This would mean basically mean "downgrading" a PVE+StorPool node to a 
StorPool node on quorum loss, which is your intended semantics (I think?).

This approach does come with a new problem though - once this node rejoins the 
cluster, you'd need to bring up all of the PVE stack again in an orderly 
fashion.

I hope the above explains why and how PVE is using self-fencing via watchdogs, and the 
implications of disabling that while keeping HA "enabled". If something is 
unclear or you have more questions, please reach out!

Thank you for the detailed feedback and helpful explanations. Your suggestion is essentially what we had in mind with the "automatic recovery" idea, and it seems like the correct direction for the watchdog after separating it from the plugin.

Best regards,

--
Ivaylo Markov
Quality & Automation Engineer
StorPool Storage
https://www.storpool.com


--- End Message ---
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to