[ https://issues.apache.org/jira/browse/CLOUDSTACK-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wido den Hollander closed CLOUDSTACK-8643. ------------------------------------------ Resolution: Won't Fix > Helper for KVM High Availability > -------------------------------- > > Key: CLOUDSTACK-8643 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8643 > Project: CloudStack > Issue Type: Improvement > Security Level: Public(Anyone can view this level - this is the > default.) > Components: KVM, Management Server > Environment: KVM hypervisors > Reporter: Wido den Hollander > Labels: fence, high-availability, kvm, libvirt > Fix For: Future > > > When running KVM with NFS storage all Agents will write a heartbeat to the > NFS. > Should a Agent go down, it will still be writing heartbeats even if libvirt > has died. > Using these heartbeats the Management Server can ask other KVM Agents if the > other server is still beating. If not, it can fence it. > While this works I've also encountered scenarios where you run without NFS > and still want investigators. > My proposal would be a Agent Helper running NEXT to the Agent it self. > A simple Python daemon running a Basic HTTP server which queries libvirt > every X seconds about: > * Running Instances > * Storage pools > If keeps this in memory, so that even when libvirt goes down it knows what > the last state was. > Using the Qemu Monitor sockets we can actually see if the guests we have in > memory are still online. > If they are we simply keep the list. > Now, if a investigator comes by and wants to know if the host is still up it > can ALSO ask the helper. > The management server can ask the helper, but the other agents could as well. > This doesn't work in all cases, eg where storage is lost. But a additional > helper would be useful to catch scenarios where the Agent itself became > unresponsive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)