I performed VM HA sanity checks and was not able to reproduce any regression 
against two KVM CentOS7 hosts in a cluster.


Without the "Host HA" feature, I deployed few HA-enabled VMs on a KVM host2 and 
killed it (powered off). After few minutes of CloudStack attempting to find why 
the host (kvm agent) timed out, CloudStack kicked investigators, that 
eventually led KVM fencers to work and VM HA job kicked to start those few VMs 
on host1 and the KVM host2 was put to "Down" state.


- Rohit

<https://cloudstack.apache.org>



________________________________

rohit.ya...@shapeblue.comĀ 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

From: Rohit Yadav
Sent: Wednesday, January 17, 2018 2:39:19 PM
To: dev
Subject: Re: HA issues


Hi Lucian,


The "Host HA" feature is entirely different from VM HA, however, they may work 
in tandem, so please stop using the terms interchangeably as it may cause the 
community to believe a regression has been caused.


The "Host HA" feature currently ships with only "Host HA" provider for KVM that 
is strictly tied to out-of-band management (IPMI for fencing, i.e power off and 
recovery, i.e. reboot) and NFS (as primary storage). (We also have a provider 
for simulator, but that's for coverage/testing purposes).


Therefore, "Host HA" for KVM (+nfs) currently works only when OOBM is enabled. 
The frameowkr allows interested parties may write their own HA providers for a 
hypervisor that can use a different strategy/mechanism for fencing/recovery of 
hosts (including write a non-IPMI based OOBM plugin) and host/disk activity 
checker that is non-NFS based.


The "Host HA" feature ships disabled by default and does not cause any 
interference with VM HA. However, when enabled and configured correctly, it is 
a known limitation that when it is unable to successfully perform recovery or 
fencing tasks it may not trigger VM HA. We can discuss how to handle such cases 
(thoughts?). "Host HA" would try couple of times to recover and failing to do 
so, it would eventually trigger a host fencing task. If it's unable to fence a 
host, it will indefinitely attempt to fence the host (the host state will be 
stuck at fencing state in cloud.ha_config table for example) and alerts will be 
sent to admin who can do some manual intervention to handle such situations (if 
you've email/smtp enabled, you should see alert emails).


We can discuss how to improve and have a workaround for the case you've hit, 
thanks for sharing.


- Rohit

________________________________
From: Nux! <n...@li.nux.ro>
Sent: Tuesday, January 16, 2018 10:42:35 PM
To: dev
Subject: Re: HA issues

Ok, reinstalled and re-tested.

What I've learned:

- HA only works now if OOB is configured, the old way HA no longer applies - 
this can be good and bad, not everyone has IPMIs

- HA only works if IPMI is reachable. I've pulled the cord on a HV and HA 
failed to do its thing, leaving me with a HV down along with all the VMs 
running there. That's bad.
I've opened this ticket for it:
https://issues.apache.org/jira/browse/CLOUDSTACK-10234

Let me know if you need any extra info or stuff to test.

Regards,
Lucian

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

----- Original Message -----
> From: "Nux!" <n...@li.nux.ro>
> To: "dev" <dev@cloudstack.apache.org>
> Sent: Tuesday, 16 January, 2018 11:35:58
> Subject: Re: HA issues

> I'll reinstall my setup and try again, just to be sure I'm working on a clean
> slate.
>
> --
> Sent from the Delta quadrant using Borg technology!
>
> Nux!
> www.nux.ro
>
> ----- Original Message -----
>> From: "Rohit Yadav" <rohit.ya...@shapeblue.com>
>> To: "dev" <dev@cloudstack.apache.org>
>> Sent: Tuesday, 16 January, 2018 11:29:51
>> Subject: Re: HA issues
>
>> Hi Lucian,
>>
>>
>> If you're talking about the new HostHA feature (with KVM+nfs+ipmi), please 
>> refer
>> to following docs:
>>
>> http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/latest/hosts.html#out-of-band-management
>>
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
>>
>>
>> We'll need to you look at logs perhaps create a JIRA ticket with the logs and
>> details? If you saw ipmi based reboot, then host-ha indeed tried to recover
>> i.e. reboot the host, once hostha has done its work it would schedule HA for 
>> VM
>> as soon as the recovery operation succeeds (we've simulator and kvm based
>> marvin tests for such scenarios).
>>
>>
>> Can you see it making attempt to schedule VM ha in logs, or any failure?
>>
>>
>> - Rohit
>>
>> <https://cloudstack.apache.org>
>>
>>
>>
>> ________________________________
>> From: Nux! <n...@li.nux.ro>
>> Sent: Tuesday, January 16, 2018 12:47:56 AM
>> To: dev
>> Subject: [4.11] HA issues
>>
>> Hi,
>>
>> I see there's a new HA engine for KVM and IPMI support which is really nice,
>> however it seems hit and miss.
>> I have created an instance with HA offering, kernel panicked one of the
>> hypervisors - after a while the server was rebooted via IPMI probably, but 
>> the
>> instance never moved to a running hypervisor and even after the original
>> hypervisor came back it was still left in Stopped state.
>> Is there any extra things I need to set up to have proper HA?
>>
>> Regards,
>> Lucian
>>
>> --
>> Sent from the Delta quadrant using Borg technology!
>>
>> Nux!
>> www.nux.ro
>>
>> rohit.ya...@shapeblue.com
>> www.shapeblue.com<http://www.shapeblue.com>
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > @shapeblue

Reply via email to