Hi Sean,

I have a few questions please.
Could you explain to me as a non-developer, how CloudStack now determines that 
a host is actually 'Down'. Also, what happens if an operator is using block 
storage which virtlockd doesn't support, and how does CloudStack determine that 
virtlockd is installed/configure and enabled on the hosts?

I'm just trying to understand the use case, thanks.

paul.an...@shapeblue.comĀ 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 


-----Original Message-----
From: Nux! [mailto:n...@li.nux.ro] 
Sent: 02 March 2018 14:41
To: dev <dev@cloudstack.apache.org>
Subject: Re: HA issues

Thanks, looking forward to having HA in Cloudstack again! :-)

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

----- Original Message -----
> From: "Sean Lair" <sl...@ippathways.com>
> To: "dev" <dev@cloudstack.apache.org>
> Sent: Thursday, 1 March, 2018 22:01:50
> Subject: RE: HA issues

> FYI Nux, I opened the following PR for the change we made in our 
> environment to get VM HA to work.  I referenced your ticket!
> 
> https://github.com/apache/cloudstack/pull/2474
> 
> 
> -----Original Message-----
> From: Nux! [mailto:n...@li.nux.ro]
> Sent: Monday, January 22, 2018 8:15 AM
> To: dev <dev@cloudstack.apache.org>
> Subject: Re: HA issues
> 
> Hi,
> 
> Installed and reinstalled, VM HA just does not work for me.
> In addition, if the HV going AWOL is hosting the systemvms, then they 
> also do not get restarted despite available HVs online.
> I've opened another ticket with logs:
> 
> https://issues.apache.org/jira/browse/CLOUDSTACK-10246
> 
> Happy to allow access to my rig if it helps.
> 
> I've disabled firewall and whatnot also left out other bits of network 
> hardware just to keep it simpler, still no go.
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> ----- Original Message -----
>> From: "Paul Angus" <paul.an...@shapeblue.com>
>> To: "dev" <dev@cloudstack.apache.org>
>> Sent: Saturday, 20 January, 2018 08:40:01
>> Subject: RE: HA issues
> 
>> No problem,
>> 
>> To be honest host-ha was developed *because* vm-ha was not reliable 
>> under a number of conditions, including a host failure.
>> 
>> paul.an...@shapeblue.com
>> www.shapeblue.com
>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue
>>  
>> 
>> 
>> 
>> -----Original Message-----
>> From: Nux! [mailto:n...@li.nux.ro]
>> Sent: 19 January 2018 14:26
>> To: dev <dev@cloudstack.apache.org>
>> Subject: Re: HA issues
>> 
>> Hi Paul,
>> 
>> Thanks for checking. My compute offering is HA enabled, of course.
>> Host HA is disabled as well as OOBM.
>> 
>> 
>> I'll do the tests again on Monday and report back.
>> 
>> --
>> Sent from the Delta quadrant using Borg technology!
>> 
>> Nux!
>> www.nux.ro
>> 
>> ----- Original Message -----
>>> From: "Paul Angus" <paul.an...@shapeblue.com>
>>> To: "dev" <dev@cloudstack.apache.org>
>>> Sent: Friday, 19 January, 2018 14:10:06
>>> Subject: RE: HA issues
>> 
>>> Hey Nux,
>>> 
>>> I've being testing out the host-ha feature against a couple of physical 
>>> hosts.
>>> I've found that if the compute offering isn't ha enabled, then the vm isn't
>>> restarted on the original host when it is rebooted, or any other host.    If
>>> the vm is ha-enabled, then the vm was restarted on the original host 
>>> when host ha restarted the host.
>>> 
>>> Can you double check that the instance was an ha-enabled one?
>>> 
>>> OR
>>> maybe the timeouts for the host-ha are too long and the vm-ha 
>>> timed-out before hand ...?
>>> 
>>> 
>>> 
>>> Kind regards,
>>> 
>>> Paul Angus
>>> 
>>> paul.an...@shapeblue.com
>>> www.shapeblue.com
>>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue
>>>  
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Nux! [mailto:n...@li.nux.ro]
>>> Sent: 17 January 2018 09:12
>>> To: dev <dev@cloudstack.apache.org>
>>> Subject: Re: HA issues
>>> 
>>> Right, sorry for using the terms interchangeably, I see what you mean.
>>> 
>>> I'll do further testing then as VM HA was also not working in my setup.
>>> 
>>> I'll be back.
>>> 
>>> --
>>> Sent from the Delta quadrant using Borg technology!
>>> 
>>> Nux!
>>> www.nux.ro
>>> 
>>> ----- Original Message -----
>>>> From: "Rohit Yadav" <rohit.ya...@shapeblue.com>
>>>> To: "dev" <dev@cloudstack.apache.org>
>>>> Sent: Wednesday, 17 January, 2018 09:09:19
>>>> Subject: Re: HA issues
>>> 
>>>> Hi Lucian,
>>>> 
>>>> 
>>>> The "Host HA" feature is entirely different from VM HA, however, 
>>>> they may work in tandem, so please stop using the terms 
>>>> interchangeably as it may cause the community to believe a 
>>>> regression has been caused.
>>>> 
>>>> 
>>>> The "Host HA" feature currently ships with only "Host HA" provider 
>>>> for KVM that is strictly tied to out-of-band management (IPMI for 
>>>> fencing, i.e power off and recovery, i.e. reboot) and NFS (as primary 
>>>> storage).
>>>> (We also have a provider for simulator, but that's for 
>>>> coverage/testing purposes).
>>>> 
>>>> 
>>>> Therefore, "Host HA" for KVM (+nfs) currently works only when OOBM is 
>>>> enabled.
>>>> The frameowkr allows interested parties may write their own HA 
>>>> providers for a hypervisor that can use a different 
>>>> strategy/mechanism for fencing/recovery of hosts (including write a 
>>>> non-IPMI based OOBM
>>>> plugin) and host/disk activity checker that is non-NFS based.
>>>> 
>>>> 
>>>> The "Host HA" feature ships disabled by default and does not cause 
>>>> any interference with VM HA. However, when enabled and configured 
>>>> correctly, it is a known limitation that when it is unable to 
>>>> successfully perform recovery or fencing tasks it may not trigger 
>>>> VM HA. We can discuss how to handle such cases (thoughts?). "Host HA"
>>>> would try couple of times to recover and failing to do so, it would 
>>>> eventually trigger a host fencing task. If it's unable to fence a 
>>>> host, it will indefinitely attempt to fence the host (the host 
>>>> state will be stuck at fencing state in cloud.ha_config table for 
>>>> example) and alerts will be sent to admin who can do some manual 
>>>> intervention to handle such situations (if you've email/smtp 
>>>> enabled, you should see alert emails).
>>>> 
>>>> 
>>>> We can discuss how to improve and have a workaround for the case 
>>>> you've hit, thanks for sharing.
>>>> 
>>>> 
>>>> - Rohit
>>>> 
>>>> ________________________________
>>>> From: Nux! <n...@li.nux.ro>
>>>> Sent: Tuesday, January 16, 2018 10:42:35 PM
>>>> To: dev
>>>> Subject: Re: HA issues
>>>> 
>>>> Ok, reinstalled and re-tested.
>>>> 
>>>> What I've learned:
>>>> 
>>>> - HA only works now if OOB is configured, the old way HA no longer 
>>>> applies - this can be good and bad, not everyone has IPMIs
>>>> 
>>>> - HA only works if IPMI is reachable. I've pulled the cord on a HV 
>>>> and HA failed to do its thing, leaving me with a HV down along with 
>>>> all the VMs running there. That's bad.
>>>> I've opened this ticket for it:
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-10234
>>>> 
>>>> Let me know if you need any extra info or stuff to test.
>>>> 
>>>> Regards,
>>>> Lucian
>>>> 
>>>> --
>>>> Sent from the Delta quadrant using Borg technology!
>>>> 
>>>> Nux!
>>>> www.nux.ro
>>>> 
>>>> 
>>>> rohit.ya...@shapeblue.com
>>>> www.shapeblue.com
>>>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK @shapeblue
>>>>  
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Nux!" <n...@li.nux.ro>
>>>>> To: "dev" <dev@cloudstack.apache.org>
>>>>> Sent: Tuesday, 16 January, 2018 11:35:58
>>>>> Subject: Re: HA issues
>>>> 
>>>>> I'll reinstall my setup and try again, just to be sure I'm working 
>>>>> on a clean slate.
>>>>>
>>>>> --
>>>>> Sent from the Delta quadrant using Borg technology!
>>>>>
>>>>> Nux!
>>>>> www.nux.ro
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Rohit Yadav" <rohit.ya...@shapeblue.com>
>>>>>> To: "dev" <dev@cloudstack.apache.org>
>>>>>> Sent: Tuesday, 16 January, 2018 11:29:51
>>>>>> Subject: Re: HA issues
>>>>>
>>>>>> Hi Lucian,
>>>>>>
>>>>>>
>>>>>> If you're talking about the new HostHA feature (with
>>>>>> KVM+nfs+ipmi), please refer to following docs:
>>>>>>
>>>>>> http://docs.cloudstack.apache.org/projects/cloudstack-administrat
>>>>>> i o n /en/latest/hosts.html#out-of-band-management
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
>>>>>>
>>>>>>
>>>>>> We'll need to you look at logs perhaps create a JIRA ticket with 
>>>>>> the logs and details? If you saw ipmi based reboot, then host-ha 
>>>>>> indeed tried to recover i.e. reboot the host, once hostha has 
>>>>>> done its work it would schedule HA for VM as soon as the recovery 
>>>>>> operation succeeds (we've simulator and kvm based marvin tests 
>>>>>> for such scenarios).
>>>>>>
>>>>>>
>>>>>> Can you see it making attempt to schedule VM ha in logs, or any failure?
>>>>>>
>>>>>>
>>>>>> - Rohit
>>>>>>
>>>>>> <https://cloudstack.apache.org>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> From: Nux! <n...@li.nux.ro>
>>>>>> Sent: Tuesday, January 16, 2018 12:47:56 AM
>>>>>> To: dev
>>>>>> Subject: [4.11] HA issues
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I see there's a new HA engine for KVM and IPMI support which is 
>>>>>> really nice, however it seems hit and miss.
>>>>>> I have created an instance with HA offering, kernel panicked one 
>>>>>> of the hypervisors - after a while the server was rebooted via 
>>>>>> IPMI probably, but the instance never moved to a running 
>>>>>> hypervisor and even after the original hypervisor came back it 
>>>>>> was still left in Stopped state.
>>>>>> Is there any extra things I need to set up to have proper HA?
>>>>>>
>>>>>> Regards,
>>>>>> Lucian
>>>>>>
>>>>>> --
>>>>>> Sent from the Delta quadrant using Borg technology!
>>>>>>
>>>>>> Nux!
>>>>>> www.nux.ro
>>>>>>
>>>>>> rohit.ya...@shapeblue.com
>>>>>> www.shapeblue.com<http://www.shapeblue.com>
>>>>>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > > > > > @shapeblue

Reply via email to