Nothing, looks good. (And thanks for opening it Paul)
On Wed, Jul 24, 2013 at 10:27 PM, Bryan Whitehead <[email protected]>wrote: > CLOUDSTACK-3535 bug looks like it is describing the problem perfectly. > What else can we add? > > On Wed, Jul 24, 2013 at 7:20 PM, Chip Childers > <[email protected]> wrote: > > This sucks. > > > > Can one of the folks on this thread please open a bug with as much > > information as possible? I'd like to make sure that someone picks up the > > issue and gets it resolved for the next release. > > > > > > > > On Wed, Jul 24, 2013 at 7:26 PM, Bryan Whitehead <[email protected] > >wrote: > > > >> This same thing happened to me - but it was a Power-Supply that died > >> on a box. All my templates have HA turned on. > >> > >> All the VM's (including 1 system-router-vm) were shown as "Running" > >> and the host itself was simply marked "Disconnected". When I tried to > >> shutdown the VM's to start them again I got errors about not being > >> able to communicate with the agent. I tried restarting the management > >> server but that didn't change anything. > >> > >> Getting the router working again was extremely annoying. After > >> changing it to Stopped it kept trying to start it again on the dead > >> host. I marked it destroyed then restarted the network with the force > >> option. That fixed it. After I hacked the DB to get all my VM's not > >> running with state Running to Stopped, then I was able to start all > >> the VM's that were down on the bad host. > >> > >> Anyway, The time between host death and me finding out was about 4 > >> days - as these were on managed servers of a customer and their > >> monitoring of each host wasn't working. They were pretty unhappy. :( > >> > >> Other notes: this is KVM with sharedmountpoint on a gluster mount. > >> After host got back online gluster rsynced about 200GB of data - I > >> migrated VM's to the host at the same time as normal. I've had a > >> similar things happen with 3.0.2 install of cloudstack and everything > >> seamlessly restarted. Disappointing this happened with 4.1 > >> > >> On Wed, Jul 24, 2013 at 9:23 AM, Indra Pramana <[email protected]> wrote: > >> > Dear Chip, Geoff and all, > >> > > >> > I scrutinized the management server's logs during the time when I > >> shutdown > >> > the host and the time when I turned the host back on. > >> > > >> > This is the management server's logs when the host is being shut down: > >> > > >> > http://pastebin.com/4wfV830Z > >> > > >> > During the time, I noted that there are quite a lot of "Sending > >> Disconnect > >> > to listener" messages, which implies that the management server try to > >> > notify other listeners that the host is going down. However, > >> subsequently I > >> > didn't see any messages on the logs showing that the management > server is > >> > trying to activate the HA capability to start the affected VMs on > another > >> > available host. > >> > > >> > This is the management server's logs when the host is being turned > back > >> on: > >> > > >> > http://pastebin.com/JrLJxbXH > >> > > >> > When the agent is reconnected, then CloudStack marked the affected > VMs as > >> > stopped from previously running: > >> > > >> > === > >> > 2013-07-24 23:04:57,406 DEBUG [cloud.vm.VirtualMachineManagerImpl] > >> > (AgentConnectTaskPool-7:null) Found 5 VMs for host 34 > >> > 2013-07-24 23:04:57,408 DEBUG [cloud.vm.VirtualMachineManagerImpl] > >> > (AgentConnectTaskPool-7:null) VM i-2-273-VM: cs state = Running and > >> > realState = Stopped > >> > 2013-07-24 23:04:57,408 DEBUG [cloud.vm.VirtualMachineManagerImpl] > >> > (AgentConnectTaskPool-7:null) VM i-2-273-VM: cs state = Running and > >> > realState = Stopped > >> > 2013-07-24 23:04:57,408 DEBUG [cloud.ha.HighAvailabilityManagerImpl] > >> > (AgentConnectTaskPool-7:null) VM does not require investigation so I'm > >> > marking it as Stopped: VM[User|Ubuntu-12-04-2-64bit] > >> > 2013-07-24 23:04:57,450 DEBUG [cloud.capacity.CapacityManagerImpl] > >> > (AgentConnectTaskPool-7:null) VM state transitted from :Running to > >> Stopping > >> > with event: StopRequestedvm's original host id: 28 new host id: 34 > host > >> id > >> > before state transition: 34 > >> > === > >> > > >> > Then the HA starts to kick in. > >> > > >> > === > >> > 2013-07-24 23:04:57,955 INFO [cloud.ha.HighAvailabilityManagerImpl] > >> > (HA-Worker-1:work-307) Processing HAWork[307-HA-273-Stopped-Scheduled] > >> > 2013-07-24 23:04:57,956 DEBUG [cloud.capacity.CapacityManagerImpl] > >> > (AgentConnectTaskPool-7:null) VM state transitted from :Running to > >> Stopping > >> > with event: StopRequestedvm's original host id: 28 new host id: 34 > host > >> id > >> > before state transition: 34 > >> > 2013-07-24 23:04:57,960 DEBUG [agent.transport.Request] > >> > (AgentConnectTaskPool-7:null) Seq 34-105644038: Sending { Cmd , > MgmtId: > >> > 161342671900, via: 34, Ver: v1, Flags: 100111, > >> > [{"StopCommand":{"isProxy":false,"vmName":"i-2-281-VM","wait":0}}] } > >> > 2013-07-24 23:04:57,968 INFO [cloud.ha.HighAvailabilityManagerImpl] > >> > (HA-Worker-1:work-307) HA on VM[User|Ubuntu-12-04-2-64bit] > >> > 2013-07-24 23:04:57,984 DEBUG [cloud.capacity.CapacityManagerImpl] > >> > (HA-Worker-1:work-307) VM state transitted from :Stopped to Starting > with > >> > event: StartRequestedvm's original host id: 28 new host id: null host > id > >> > before state transition: null > >> > 2013-07-24 23:04:57,984 DEBUG [cloud.vm.VirtualMachineManagerImpl] > >> > (HA-Worker-1:work-307) Successfully transitioned to start state for > >> > VM[User|Ubuntu-12-04-2-64bit] reservation id = > >> > b56364ef-90d8-443f-a348-7660fda48d34 > >> > 2013-07-24 23:04:58,025 DEBUG [cloud.vm.VirtualMachineManagerImpl] > >> > (HA-Worker-1:work-307) Trying to deploy VM, vm has dcId: 6 and podId: > 6 > >> > 2013-07-24 23:04:58,025 DEBUG [cloud.vm.VirtualMachineManagerImpl] > >> > (HA-Worker-1:work-307) Deploy avoids pods: null, clusters: null, > hosts: > >> null > >> > 2013-07-24 23:04:58,031 DEBUG [cloud.vm.VirtualMachineManagerImpl] > >> > (HA-Worker-1:work-307) Root volume is ready, need to place VM in > volume's > >> > cluster > >> > 2013-07-24 23:04:58,031 DEBUG [cloud.vm.VirtualMachineManagerImpl] > >> > (HA-Worker-1:work-307) Vol[295|vm=273|ROOT] is READY, changing > deployment > >> > plan to use this pool's dcId: 6 , podId: 6 , and clusterId: 6 > >> > === > >> > > >> > My question is why HA only kicks in when the host is turned back on? > By > >> > right it should kick in soon after the host is shut down and marked as > >> > "Disconnected". > >> > > >> > Any insights on the possible solutions to this problem is highly > >> > appreciated. > >> > > >> > Looking forward to your reply, thank you. > >> > > >> > Cheers. > >> > > >> > > >> > > >> > On Thu, Jul 25, 2013 at 12:00 AM, Indra Pramana <[email protected]> > wrote: > >> > > >> >> Hi Chip, > >> >> > >> >> Yes, "Offer HA" is set to "Yes" on all my compute offerings. > >> >> > >> >> Hi Geoff, > >> >> > >> >> Yes, I am using KVM. Is this a known issue and is there any solution > to > >> >> this problem? > >> >> > >> >> Looking forward to your reply, thank you. > >> >> > >> >> Cheers. > >> >> > >> >> > >> >> > >> >> On Wed, Jul 24, 2013 at 11:38 PM, Geoff Higginbottom < > >> >> [email protected]> wrote: > >> >> > >> >>> Is it running on KVM, we are seeing some real issue with HA simply > not > >> >>> working on KVM. > >> >>> > >> >>> Regards > >> >>> > >> >>> Geoff Higginbottom > >> >>> > >> >>> D: +44 20 3603 0542 | S: +44 20 3603 0540 | M: +447968161581 > >> >>> > >> >>> [email protected] > >> >>> > >> >>> -----Original Message----- > >> >>> From: Chip Childers [mailto:[email protected]] > >> >>> Sent: 24 July 2013 16:37 > >> >>> To: <[email protected]> > >> >>> Subject: Re: HA not working - CloudStack 4.1.0 and KVM hypervisor > hosts > >> >>> > >> >>> Did you enable HA for your compute offering? > >> >>> > >> >>> On Jul 24, 2013, at 11:25 AM, Indra Pramana <[email protected]> wrote: > >> >>> > >> >>> > Dear all, > >> >>> > > >> >>> > I tried to shutdown one of my hypervisor hosts to simulate a > server > >> >>> > failure, and the HA is not working, all the VMs on the affected > host > >> >>> > is not started on another available host. > >> >>> > > >> >>> > I am using CloudStack 4.1.0 with KVM hypervisors and Ceph RBD for > >> >>> > primary storage. > >> >>> > > >> >>> > My issue is similar to what is being described here: > >> >>> > > >> >>> > https://issues.apache.org/jira/browse/CLOUDSTACK-3535 > >> >>> > > >> >>> > Except that on my case, the host is indeed marked as > "Disconnected" > >> >>> > but there is no attempt from CloudStack to try starting the VMs on > >> >>> > another host. I can't provide logs since there's nothing on the > logs > >> >>> > which suggest that CloudStack tries to activate the HA and start > the > >> >>> > affected VMs on another host. > >> >>> > > >> >>> > Anyone has similar experience? Anyone knows if the above bug has > been > >> >>> > resolved? > >> >>> > > >> >>> > Looking forward to your reply, thank you. > >> >>> > > >> >>> > Cheers. > >> >>> This email and any attachments to it may be confidential and are > >> intended > >> >>> solely for the use of the individual to whom it is addressed. Any > >> views or > >> >>> opinions expressed are solely those of the author and do not > >> necessarily > >> >>> represent those of Shape Blue Ltd or related companies. If you are > not > >> the > >> >>> intended recipient of this email, you must neither take any action > >> based > >> >>> upon its contents, nor copy or show it to anyone. Please contact the > >> sender > >> >>> if you believe you have received this email in error. Shape Blue Ltd > >> is a > >> >>> company incorporated in England & Wales. ShapeBlue Services India > LLP > >> is > >> >>> operated under license from Shape Blue Ltd. ShapeBlue is a > registered > >> >>> trademark. > >> >>> > >> >> > >> >> > >> > >> > >
