[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wido den Hollander closed CLOUDSTACK-8643.
------------------------------------------
    Resolution: Won't Fix

> Helper for KVM High Availability
> --------------------------------
>
>                 Key: CLOUDSTACK-8643
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8643
>             Project: CloudStack
>          Issue Type: Improvement
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: KVM, Management Server
>         Environment: KVM hypervisors
>            Reporter: Wido den Hollander
>              Labels: fence, high-availability, kvm, libvirt
>             Fix For: Future
>
>
> When running KVM with NFS storage all Agents will write a heartbeat to the 
> NFS.
> Should a Agent go down, it will still be writing heartbeats even if libvirt 
> has died.
> Using these heartbeats the Management Server can ask other KVM Agents if the 
> other server is still beating. If not, it can fence it.
> While this works I've also encountered scenarios where you run without NFS 
> and still want investigators.
> My proposal would be a Agent Helper running NEXT to the Agent it self.
> A simple Python daemon running a Basic HTTP server which queries libvirt 
> every X seconds about:
> * Running Instances
> * Storage pools
> If keeps this in memory, so that even when libvirt goes down it knows what 
> the last state was.
> Using the Qemu Monitor sockets we can actually see if the guests we have in 
> memory are still online.
> If they are we simply keep the list.
> Now, if a investigator comes by and wants to know if the host is still up it 
> can ALSO ask the helper.
> The management server can ask the helper, but the other agents could as well.
> This doesn't work in all cases, eg where storage is lost. But a additional 
> helper would be useful to catch scenarios where the Agent itself became 
> unresponsive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to