Hi Sean, I have a few questions please. Could you explain to me as a non-developer, how CloudStack now determines that a host is actually 'Down'. Also, what happens if an operator is using block storage which virtlockd doesn't support, and how does CloudStack determine that virtlockd is installed/configure and enabled on the hosts?
I'm just trying to understand the use case, thanks. paul.an...@shapeblue.comĀ www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue -----Original Message----- From: Nux! [mailto:n...@li.nux.ro] Sent: 02 March 2018 14:41 To: dev <dev@cloudstack.apache.org> Subject: Re: HA issues Thanks, looking forward to having HA in Cloudstack again! :-) -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro ----- Original Message ----- > From: "Sean Lair" <sl...@ippathways.com> > To: "dev" <dev@cloudstack.apache.org> > Sent: Thursday, 1 March, 2018 22:01:50 > Subject: RE: HA issues > FYI Nux, I opened the following PR for the change we made in our > environment to get VM HA to work. I referenced your ticket! > > https://github.com/apache/cloudstack/pull/2474 > > > -----Original Message----- > From: Nux! [mailto:n...@li.nux.ro] > Sent: Monday, January 22, 2018 8:15 AM > To: dev <dev@cloudstack.apache.org> > Subject: Re: HA issues > > Hi, > > Installed and reinstalled, VM HA just does not work for me. > In addition, if the HV going AWOL is hosting the systemvms, then they > also do not get restarted despite available HVs online. > I've opened another ticket with logs: > > https://issues.apache.org/jira/browse/CLOUDSTACK-10246 > > Happy to allow access to my rig if it helps. > > I've disabled firewall and whatnot also left out other bits of network > hardware just to keep it simpler, still no go. > > -- > Sent from the Delta quadrant using Borg technology! > > Nux! > www.nux.ro > > ----- Original Message ----- >> From: "Paul Angus" <paul.an...@shapeblue.com> >> To: "dev" <dev@cloudstack.apache.org> >> Sent: Saturday, 20 January, 2018 08:40:01 >> Subject: RE: HA issues > >> No problem, >> >> To be honest host-ha was developed *because* vm-ha was not reliable >> under a number of conditions, including a host failure. >> >> paul.an...@shapeblue.com >> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue >> >> >> >> >> -----Original Message----- >> From: Nux! [mailto:n...@li.nux.ro] >> Sent: 19 January 2018 14:26 >> To: dev <dev@cloudstack.apache.org> >> Subject: Re: HA issues >> >> Hi Paul, >> >> Thanks for checking. My compute offering is HA enabled, of course. >> Host HA is disabled as well as OOBM. >> >> >> I'll do the tests again on Monday and report back. >> >> -- >> Sent from the Delta quadrant using Borg technology! >> >> Nux! >> www.nux.ro >> >> ----- Original Message ----- >>> From: "Paul Angus" <paul.an...@shapeblue.com> >>> To: "dev" <dev@cloudstack.apache.org> >>> Sent: Friday, 19 January, 2018 14:10:06 >>> Subject: RE: HA issues >> >>> Hey Nux, >>> >>> I've being testing out the host-ha feature against a couple of physical >>> hosts. >>> I've found that if the compute offering isn't ha enabled, then the vm isn't >>> restarted on the original host when it is rebooted, or any other host. If >>> the vm is ha-enabled, then the vm was restarted on the original host >>> when host ha restarted the host. >>> >>> Can you double check that the instance was an ha-enabled one? >>> >>> OR >>> maybe the timeouts for the host-ha are too long and the vm-ha >>> timed-out before hand ...? >>> >>> >>> >>> Kind regards, >>> >>> Paul Angus >>> >>> paul.an...@shapeblue.com >>> www.shapeblue.com >>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue >>> >>> >>> >>> >>> -----Original Message----- >>> From: Nux! [mailto:n...@li.nux.ro] >>> Sent: 17 January 2018 09:12 >>> To: dev <dev@cloudstack.apache.org> >>> Subject: Re: HA issues >>> >>> Right, sorry for using the terms interchangeably, I see what you mean. >>> >>> I'll do further testing then as VM HA was also not working in my setup. >>> >>> I'll be back. >>> >>> -- >>> Sent from the Delta quadrant using Borg technology! >>> >>> Nux! >>> www.nux.ro >>> >>> ----- Original Message ----- >>>> From: "Rohit Yadav" <rohit.ya...@shapeblue.com> >>>> To: "dev" <dev@cloudstack.apache.org> >>>> Sent: Wednesday, 17 January, 2018 09:09:19 >>>> Subject: Re: HA issues >>> >>>> Hi Lucian, >>>> >>>> >>>> The "Host HA" feature is entirely different from VM HA, however, >>>> they may work in tandem, so please stop using the terms >>>> interchangeably as it may cause the community to believe a >>>> regression has been caused. >>>> >>>> >>>> The "Host HA" feature currently ships with only "Host HA" provider >>>> for KVM that is strictly tied to out-of-band management (IPMI for >>>> fencing, i.e power off and recovery, i.e. reboot) and NFS (as primary >>>> storage). >>>> (We also have a provider for simulator, but that's for >>>> coverage/testing purposes). >>>> >>>> >>>> Therefore, "Host HA" for KVM (+nfs) currently works only when OOBM is >>>> enabled. >>>> The frameowkr allows interested parties may write their own HA >>>> providers for a hypervisor that can use a different >>>> strategy/mechanism for fencing/recovery of hosts (including write a >>>> non-IPMI based OOBM >>>> plugin) and host/disk activity checker that is non-NFS based. >>>> >>>> >>>> The "Host HA" feature ships disabled by default and does not cause >>>> any interference with VM HA. However, when enabled and configured >>>> correctly, it is a known limitation that when it is unable to >>>> successfully perform recovery or fencing tasks it may not trigger >>>> VM HA. We can discuss how to handle such cases (thoughts?). "Host HA" >>>> would try couple of times to recover and failing to do so, it would >>>> eventually trigger a host fencing task. If it's unable to fence a >>>> host, it will indefinitely attempt to fence the host (the host >>>> state will be stuck at fencing state in cloud.ha_config table for >>>> example) and alerts will be sent to admin who can do some manual >>>> intervention to handle such situations (if you've email/smtp >>>> enabled, you should see alert emails). >>>> >>>> >>>> We can discuss how to improve and have a workaround for the case >>>> you've hit, thanks for sharing. >>>> >>>> >>>> - Rohit >>>> >>>> ________________________________ >>>> From: Nux! <n...@li.nux.ro> >>>> Sent: Tuesday, January 16, 2018 10:42:35 PM >>>> To: dev >>>> Subject: Re: HA issues >>>> >>>> Ok, reinstalled and re-tested. >>>> >>>> What I've learned: >>>> >>>> - HA only works now if OOB is configured, the old way HA no longer >>>> applies - this can be good and bad, not everyone has IPMIs >>>> >>>> - HA only works if IPMI is reachable. I've pulled the cord on a HV >>>> and HA failed to do its thing, leaving me with a HV down along with >>>> all the VMs running there. That's bad. >>>> I've opened this ticket for it: >>>> https://issues.apache.org/jira/browse/CLOUDSTACK-10234 >>>> >>>> Let me know if you need any extra info or stuff to test. >>>> >>>> Regards, >>>> Lucian >>>> >>>> -- >>>> Sent from the Delta quadrant using Borg technology! >>>> >>>> Nux! >>>> www.nux.ro >>>> >>>> >>>> rohit.ya...@shapeblue.com >>>> www.shapeblue.com >>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue >>>> >>>> >>>> >>>> ----- Original Message ----- >>>>> From: "Nux!" <n...@li.nux.ro> >>>>> To: "dev" <dev@cloudstack.apache.org> >>>>> Sent: Tuesday, 16 January, 2018 11:35:58 >>>>> Subject: Re: HA issues >>>> >>>>> I'll reinstall my setup and try again, just to be sure I'm working >>>>> on a clean slate. >>>>> >>>>> -- >>>>> Sent from the Delta quadrant using Borg technology! >>>>> >>>>> Nux! >>>>> www.nux.ro >>>>> >>>>> ----- Original Message ----- >>>>>> From: "Rohit Yadav" <rohit.ya...@shapeblue.com> >>>>>> To: "dev" <dev@cloudstack.apache.org> >>>>>> Sent: Tuesday, 16 January, 2018 11:29:51 >>>>>> Subject: Re: HA issues >>>>> >>>>>> Hi Lucian, >>>>>> >>>>>> >>>>>> If you're talking about the new HostHA feature (with >>>>>> KVM+nfs+ipmi), please refer to following docs: >>>>>> >>>>>> http://docs.cloudstack.apache.org/projects/cloudstack-administrat >>>>>> i o n /en/latest/hosts.html#out-of-band-management >>>>>> >>>>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA >>>>>> >>>>>> >>>>>> We'll need to you look at logs perhaps create a JIRA ticket with >>>>>> the logs and details? If you saw ipmi based reboot, then host-ha >>>>>> indeed tried to recover i.e. reboot the host, once hostha has >>>>>> done its work it would schedule HA for VM as soon as the recovery >>>>>> operation succeeds (we've simulator and kvm based marvin tests >>>>>> for such scenarios). >>>>>> >>>>>> >>>>>> Can you see it making attempt to schedule VM ha in logs, or any failure? >>>>>> >>>>>> >>>>>> - Rohit >>>>>> >>>>>> <https://cloudstack.apache.org> >>>>>> >>>>>> >>>>>> >>>>>> ________________________________ >>>>>> From: Nux! <n...@li.nux.ro> >>>>>> Sent: Tuesday, January 16, 2018 12:47:56 AM >>>>>> To: dev >>>>>> Subject: [4.11] HA issues >>>>>> >>>>>> Hi, >>>>>> >>>>>> I see there's a new HA engine for KVM and IPMI support which is >>>>>> really nice, however it seems hit and miss. >>>>>> I have created an instance with HA offering, kernel panicked one >>>>>> of the hypervisors - after a while the server was rebooted via >>>>>> IPMI probably, but the instance never moved to a running >>>>>> hypervisor and even after the original hypervisor came back it >>>>>> was still left in Stopped state. >>>>>> Is there any extra things I need to set up to have proper HA? >>>>>> >>>>>> Regards, >>>>>> Lucian >>>>>> >>>>>> -- >>>>>> Sent from the Delta quadrant using Borg technology! >>>>>> >>>>>> Nux! >>>>>> www.nux.ro >>>>>> >>>>>> rohit.ya...@shapeblue.com >>>>>> www.shapeblue.com<http://www.shapeblue.com> >>>>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > > > > > @shapeblue