Nothing, looks good.

(And thanks for opening it Paul)


On Wed, Jul 24, 2013 at 10:27 PM, Bryan Whitehead <[email protected]>wrote:

> CLOUDSTACK-3535 bug looks like it is describing the problem perfectly.
> What else can we add?
>
> On Wed, Jul 24, 2013 at 7:20 PM, Chip Childers
> <[email protected]> wrote:
> > This sucks.
> >
> > Can one of the folks on this thread please open a bug with as much
> > information as possible?  I'd like to make sure that someone picks up the
> > issue and gets it resolved for the next release.
> >
> >
> >
> > On Wed, Jul 24, 2013 at 7:26 PM, Bryan Whitehead <[email protected]
> >wrote:
> >
> >> This same thing happened to me - but it was a Power-Supply that died
> >> on a box. All my templates have HA turned on.
> >>
> >> All the VM's (including 1 system-router-vm) were shown as "Running"
> >> and the host itself was simply marked "Disconnected". When I tried to
> >> shutdown the VM's to start them again I got errors about not being
> >> able to communicate with the agent. I tried restarting the management
> >> server but that didn't change anything.
> >>
> >> Getting the router working again was extremely annoying. After
> >> changing it to Stopped it kept trying to start it again on the dead
> >> host. I marked it destroyed then restarted the network with the force
> >> option. That fixed it. After I hacked the DB to get all my VM's not
> >> running with state Running to Stopped, then I was able to start all
> >> the VM's that were down on the bad host.
> >>
> >> Anyway, The time between host death and me finding out was about 4
> >> days - as these were on managed servers of a customer and their
> >> monitoring of each host wasn't working. They were pretty unhappy. :(
> >>
> >> Other notes: this is KVM with sharedmountpoint on a gluster mount.
> >> After host got back online gluster rsynced about 200GB of data - I
> >> migrated VM's to the host at the same time as normal. I've had a
> >> similar things happen with 3.0.2 install of cloudstack and everything
> >> seamlessly restarted. Disappointing this happened with 4.1
> >>
> >> On Wed, Jul 24, 2013 at 9:23 AM, Indra Pramana <[email protected]> wrote:
> >> > Dear Chip, Geoff and all,
> >> >
> >> > I scrutinized the management server's logs during the time when I
> >> shutdown
> >> > the host and the time when I turned the host back on.
> >> >
> >> > This is the management server's logs when the host is being shut down:
> >> >
> >> > http://pastebin.com/4wfV830Z
> >> >
> >> > During the time, I noted that there are quite a lot of "Sending
> >> Disconnect
> >> > to listener" messages, which implies that the management server try to
> >> > notify other listeners that the host is going down. However,
> >> subsequently I
> >> > didn't see any messages on the logs showing that the management
> server is
> >> > trying to activate the HA capability to start the affected VMs on
> another
> >> > available host.
> >> >
> >> > This is the management server's logs when the host is being turned
> back
> >> on:
> >> >
> >> > http://pastebin.com/JrLJxbXH
> >> >
> >> > When the agent is reconnected, then CloudStack marked the affected
> VMs as
> >> > stopped from previously running:
> >> >
> >> > ===
> >> > 2013-07-24 23:04:57,406 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> >> > (AgentConnectTaskPool-7:null) Found 5 VMs for host 34
> >> > 2013-07-24 23:04:57,408 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> >> > (AgentConnectTaskPool-7:null) VM i-2-273-VM: cs state = Running and
> >> > realState = Stopped
> >> > 2013-07-24 23:04:57,408 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> >> > (AgentConnectTaskPool-7:null) VM i-2-273-VM: cs state = Running and
> >> > realState = Stopped
> >> > 2013-07-24 23:04:57,408 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
> >> > (AgentConnectTaskPool-7:null) VM does not require investigation so I'm
> >> > marking it as Stopped: VM[User|Ubuntu-12-04-2-64bit]
> >> > 2013-07-24 23:04:57,450 DEBUG [cloud.capacity.CapacityManagerImpl]
> >> > (AgentConnectTaskPool-7:null) VM state transitted from :Running to
> >> Stopping
> >> > with event: StopRequestedvm's original host id: 28 new host id: 34
> host
> >> id
> >> > before state transition: 34
> >> > ===
> >> >
> >> > Then the HA starts to kick in.
> >> >
> >> > ===
> >> > 2013-07-24 23:04:57,955 INFO  [cloud.ha.HighAvailabilityManagerImpl]
> >> > (HA-Worker-1:work-307) Processing HAWork[307-HA-273-Stopped-Scheduled]
> >> > 2013-07-24 23:04:57,956 DEBUG [cloud.capacity.CapacityManagerImpl]
> >> > (AgentConnectTaskPool-7:null) VM state transitted from :Running to
> >> Stopping
> >> > with event: StopRequestedvm's original host id: 28 new host id: 34
> host
> >> id
> >> > before state transition: 34
> >> > 2013-07-24 23:04:57,960 DEBUG [agent.transport.Request]
> >> > (AgentConnectTaskPool-7:null) Seq 34-105644038: Sending  { Cmd ,
> MgmtId:
> >> > 161342671900, via: 34, Ver: v1, Flags: 100111,
> >> > [{"StopCommand":{"isProxy":false,"vmName":"i-2-281-VM","wait":0}}] }
> >> > 2013-07-24 23:04:57,968 INFO  [cloud.ha.HighAvailabilityManagerImpl]
> >> > (HA-Worker-1:work-307) HA on VM[User|Ubuntu-12-04-2-64bit]
> >> > 2013-07-24 23:04:57,984 DEBUG [cloud.capacity.CapacityManagerImpl]
> >> > (HA-Worker-1:work-307) VM state transitted from :Stopped to Starting
> with
> >> > event: StartRequestedvm's original host id: 28 new host id: null host
> id
> >> > before state transition: null
> >> > 2013-07-24 23:04:57,984 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> >> > (HA-Worker-1:work-307) Successfully transitioned to start state for
> >> > VM[User|Ubuntu-12-04-2-64bit] reservation id =
> >> > b56364ef-90d8-443f-a348-7660fda48d34
> >> > 2013-07-24 23:04:58,025 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> >> > (HA-Worker-1:work-307) Trying to deploy VM, vm has dcId: 6 and podId:
> 6
> >> > 2013-07-24 23:04:58,025 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> >> > (HA-Worker-1:work-307) Deploy avoids pods: null, clusters: null,
> hosts:
> >> null
> >> > 2013-07-24 23:04:58,031 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> >> > (HA-Worker-1:work-307) Root volume is ready, need to place VM in
> volume's
> >> > cluster
> >> > 2013-07-24 23:04:58,031 DEBUG [cloud.vm.VirtualMachineManagerImpl]
> >> > (HA-Worker-1:work-307) Vol[295|vm=273|ROOT] is READY, changing
> deployment
> >> > plan to use this pool's dcId: 6 , podId: 6 , and clusterId: 6
> >> > ===
> >> >
> >> > My question is why HA only kicks in when the host is turned back on?
> By
> >> > right it should kick in soon after the host is shut down and marked as
> >> > "Disconnected".
> >> >
> >> > Any insights on the possible solutions to this problem is highly
> >> > appreciated.
> >> >
> >> > Looking forward to your reply, thank you.
> >> >
> >> > Cheers.
> >> >
> >> >
> >> >
> >> > On Thu, Jul 25, 2013 at 12:00 AM, Indra Pramana <[email protected]>
> wrote:
> >> >
> >> >> Hi Chip,
> >> >>
> >> >> Yes, "Offer HA" is set to "Yes" on all my compute offerings.
> >> >>
> >> >> Hi Geoff,
> >> >>
> >> >> Yes, I am using KVM. Is this a known issue and is there any solution
> to
> >> >> this problem?
> >> >>
> >> >> Looking forward to your reply, thank you.
> >> >>
> >> >> Cheers.
> >> >>
> >> >>
> >> >>
> >> >> On Wed, Jul 24, 2013 at 11:38 PM, Geoff Higginbottom <
> >> >> [email protected]> wrote:
> >> >>
> >> >>> Is it running on KVM, we are seeing some real issue with HA simply
> not
> >> >>> working on KVM.
> >> >>>
> >> >>> Regards
> >> >>>
> >> >>> Geoff Higginbottom
> >> >>>
> >> >>> D: +44 20 3603 0542 | S: +44 20 3603 0540 | M: +447968161581
> >> >>>
> >> >>> [email protected]
> >> >>>
> >> >>> -----Original Message-----
> >> >>> From: Chip Childers [mailto:[email protected]]
> >> >>> Sent: 24 July 2013 16:37
> >> >>> To: <[email protected]>
> >> >>> Subject: Re: HA not working - CloudStack 4.1.0 and KVM hypervisor
> hosts
> >> >>>
> >> >>> Did you enable HA for your compute offering?
> >> >>>
> >> >>> On Jul 24, 2013, at 11:25 AM, Indra Pramana <[email protected]> wrote:
> >> >>>
> >> >>> > Dear all,
> >> >>> >
> >> >>> > I tried to shutdown one of my hypervisor hosts to simulate a
> server
> >> >>> > failure, and the HA is not working, all the VMs on the affected
> host
> >> >>> > is not started on another available host.
> >> >>> >
> >> >>> > I am using CloudStack 4.1.0 with KVM hypervisors and Ceph RBD for
> >> >>> > primary storage.
> >> >>> >
> >> >>> > My issue is similar to what is being described here:
> >> >>> >
> >> >>> > https://issues.apache.org/jira/browse/CLOUDSTACK-3535
> >> >>> >
> >> >>> > Except that on my case, the host is indeed marked as
> "Disconnected"
> >> >>> > but there is no attempt from CloudStack to try starting the VMs on
> >> >>> > another host. I can't provide logs since there's nothing on the
> logs
> >> >>> > which suggest that CloudStack tries to activate the HA and start
> the
> >> >>> > affected VMs on another host.
> >> >>> >
> >> >>> > Anyone has similar experience? Anyone knows if the above bug has
> been
> >> >>> > resolved?
> >> >>> >
> >> >>> > Looking forward to your reply, thank you.
> >> >>> >
> >> >>> > Cheers.
> >> >>> This email and any attachments to it may be confidential and are
> >> intended
> >> >>> solely for the use of the individual to whom it is addressed. Any
> >> views or
> >> >>> opinions expressed are solely those of the author and do not
> >> necessarily
> >> >>> represent those of Shape Blue Ltd or related companies. If you are
> not
> >> the
> >> >>> intended recipient of this email, you must neither take any action
> >> based
> >> >>> upon its contents, nor copy or show it to anyone. Please contact the
> >> sender
> >> >>> if you believe you have received this email in error. Shape Blue Ltd
> >> is a
> >> >>> company incorporated in England & Wales. ShapeBlue Services India
> LLP
> >> is
> >> >>> operated under license from Shape Blue Ltd. ShapeBlue is a
> registered
> >> >>> trademark.
> >> >>>
> >> >>
> >> >>
> >>
> >>
>
>

Reply via email to