Nux created CLOUDSTACK-8943:
-------------------------------

             Summary: KVM HA is broken, let's fix it
                 Key: CLOUDSTACK-8943
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8943
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
         Environment: Linux distros with KVM/libvirt
            Reporter: Nux


Currently KVM HA works by monitoring an NFS based heartbeat file and it can 
often fail whenever this network share becomes slower, causing the hypervisors 
to reboot.

This can be particularly annoying when you have different kinds of primary 
storages in place which are working fine (people running CEPH etc).

Having to wait for the affected HV which triggered this to come back and 
declare it's not running VMs is a bad idea; this HV could require hours or days 
of maintenance!

This is embarrassing. How can we fix it? Ideas, suggestions? How are other 
hypervisors doing it?

Let's discuss, test, implement. :)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to