--- Begin Message ---
Greetings,

I was pointed here to discuss the StorPool storage plugin[0] with the dev team. If I understand correctly, there is a concern with the our HA watchdog daemon, and I'd like to explain the why and how.

As a distributed storage system, StorPool has its own internal clustering mechanisms; it can run on networks that are independent from the PVE cluster one, and thus remain unaffected by network partitions or other problems that would cause the standard PVE watchdog to reboot a node. In the case of HCI (compute + storage) nodes, this reboot can interrupt the normal operation of the StorPool cluster, causing reduced performance or downtime, which could be avoided if the host is not restarted. This is why we do our best to avoid such behavior across the different cloud management platforms.

Currently, when our daemon detects an unexpected exit of a resource manager, it will SIGKILL PVE HA services and running VMs on the node, which should prevent 2 instances of the same VM running at the same time. PVE services and our block storage client daemon are restarted as well.

We're open to discussion and suggestions for our approach and implementation.

[0] https://github.com/storpool/pve-storpool

--
Ivaylo Markov
Quality & Automation Engineer
StorPool Storage
https://www.storpool.com



--- End Message ---
_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to