Hi Paul, Thanks for checking. My compute offering is HA enabled, of course. Host HA is disabled as well as OOBM.
I'll do the tests again on Monday and report back. -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro ----- Original Message ----- > From: "Paul Angus" <paul.an...@shapeblue.com> > To: "dev" <dev@cloudstack.apache.org> > Sent: Friday, 19 January, 2018 14:10:06 > Subject: RE: HA issues > Hey Nux, > > I've being testing out the host-ha feature against a couple of physical hosts. > I've found that if the compute offering isn't ha enabled, then the vm isn't > restarted on the original host when it is rebooted, or any other host. If > the vm is ha-enabled, then the vm was restarted on the original host when host > ha restarted the host. > > Can you double check that the instance was an ha-enabled one? > > OR > maybe the timeouts for the host-ha are too long and the vm-ha timed-out before > hand ...? > > > > Kind regards, > > Paul Angus > > paul.an...@shapeblue.com > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > > -----Original Message----- > From: Nux! [mailto:n...@li.nux.ro] > Sent: 17 January 2018 09:12 > To: dev <dev@cloudstack.apache.org> > Subject: Re: HA issues > > Right, sorry for using the terms interchangeably, I see what you mean. > > I'll do further testing then as VM HA was also not working in my setup. > > I'll be back. > > -- > Sent from the Delta quadrant using Borg technology! > > Nux! > www.nux.ro > > ----- Original Message ----- >> From: "Rohit Yadav" <rohit.ya...@shapeblue.com> >> To: "dev" <dev@cloudstack.apache.org> >> Sent: Wednesday, 17 January, 2018 09:09:19 >> Subject: Re: HA issues > >> Hi Lucian, >> >> >> The "Host HA" feature is entirely different from VM HA, however, they >> may work in tandem, so please stop using the terms interchangeably as >> it may cause the community to believe a regression has been caused. >> >> >> The "Host HA" feature currently ships with only "Host HA" provider for >> KVM that is strictly tied to out-of-band management (IPMI for fencing, >> i.e power off and recovery, i.e. reboot) and NFS (as primary storage). >> (We also have a provider for simulator, but that's for coverage/testing >> purposes). >> >> >> Therefore, "Host HA" for KVM (+nfs) currently works only when OOBM is >> enabled. >> The frameowkr allows interested parties may write their own HA >> providers for a hypervisor that can use a different strategy/mechanism >> for fencing/recovery of hosts (including write a non-IPMI based OOBM >> plugin) and host/disk activity checker that is non-NFS based. >> >> >> The "Host HA" feature ships disabled by default and does not cause any >> interference with VM HA. However, when enabled and configured >> correctly, it is a known limitation that when it is unable to >> successfully perform recovery or fencing tasks it may not trigger VM >> HA. We can discuss how to handle such cases (thoughts?). "Host HA" >> would try couple of times to recover and failing to do so, it would >> eventually trigger a host fencing task. If it's unable to fence a >> host, it will indefinitely attempt to fence the host (the host state >> will be stuck at fencing state in cloud.ha_config table for example) >> and alerts will be sent to admin who can do some manual intervention to >> handle >> such situations (if you've email/smtp enabled, you should see alert emails). >> >> >> We can discuss how to improve and have a workaround for the case >> you've hit, thanks for sharing. >> >> >> - Rohit >> >> ________________________________ >> From: Nux! <n...@li.nux.ro> >> Sent: Tuesday, January 16, 2018 10:42:35 PM >> To: dev >> Subject: Re: HA issues >> >> Ok, reinstalled and re-tested. >> >> What I've learned: >> >> - HA only works now if OOB is configured, the old way HA no longer >> applies - this can be good and bad, not everyone has IPMIs >> >> - HA only works if IPMI is reachable. I've pulled the cord on a HV and >> HA failed to do its thing, leaving me with a HV down along with all >> the VMs running there. That's bad. >> I've opened this ticket for it: >> https://issues.apache.org/jira/browse/CLOUDSTACK-10234 >> >> Let me know if you need any extra info or stuff to test. >> >> Regards, >> Lucian >> >> -- >> Sent from the Delta quadrant using Borg technology! >> >> Nux! >> www.nux.ro >> >> >> rohit.ya...@shapeblue.com >> www.shapeblue.com >> 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue >> >> >> >> ----- Original Message ----- >>> From: "Nux!" <n...@li.nux.ro> >>> To: "dev" <dev@cloudstack.apache.org> >>> Sent: Tuesday, 16 January, 2018 11:35:58 >>> Subject: Re: HA issues >> >>> I'll reinstall my setup and try again, just to be sure I'm working on >>> a clean slate. >>> >>> -- >>> Sent from the Delta quadrant using Borg technology! >>> >>> Nux! >>> www.nux.ro >>> >>> ----- Original Message ----- >>>> From: "Rohit Yadav" <rohit.ya...@shapeblue.com> >>>> To: "dev" <dev@cloudstack.apache.org> >>>> Sent: Tuesday, 16 January, 2018 11:29:51 >>>> Subject: Re: HA issues >>> >>>> Hi Lucian, >>>> >>>> >>>> If you're talking about the new HostHA feature (with KVM+nfs+ipmi), >>>> please refer to following docs: >>>> >>>> http://docs.cloudstack.apache.org/projects/cloudstack-administration >>>> /en/latest/hosts.html#out-of-band-management >>>> >>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA >>>> >>>> >>>> We'll need to you look at logs perhaps create a JIRA ticket with the >>>> logs and details? If you saw ipmi based reboot, then host-ha indeed >>>> tried to recover i.e. reboot the host, once hostha has done its work >>>> it would schedule HA for VM as soon as the recovery operation >>>> succeeds (we've simulator and kvm based marvin tests for such scenarios). >>>> >>>> >>>> Can you see it making attempt to schedule VM ha in logs, or any failure? >>>> >>>> >>>> - Rohit >>>> >>>> <https://cloudstack.apache.org> >>>> >>>> >>>> >>>> ________________________________ >>>> From: Nux! <n...@li.nux.ro> >>>> Sent: Tuesday, January 16, 2018 12:47:56 AM >>>> To: dev >>>> Subject: [4.11] HA issues >>>> >>>> Hi, >>>> >>>> I see there's a new HA engine for KVM and IPMI support which is >>>> really nice, however it seems hit and miss. >>>> I have created an instance with HA offering, kernel panicked one of >>>> the hypervisors - after a while the server was rebooted via IPMI >>>> probably, but the instance never moved to a running hypervisor and >>>> even after the original hypervisor came back it was still left in Stopped >>>> state. >>>> Is there any extra things I need to set up to have proper HA? >>>> >>>> Regards, >>>> Lucian >>>> >>>> -- >>>> Sent from the Delta quadrant using Borg technology! >>>> >>>> Nux! >>>> www.nux.ro >>>> >>>> rohit.ya...@shapeblue.com >>>> www.shapeblue.com<http://www.shapeblue.com> >>>> 53 Chandos Place, Covent Garden, London WC2N 4HSUK > > > > @shapeblue