Dear Chip, Geoff and all,

I scrutinized the management server's logs during the time when I shutdown
the host and the time when I turned the host back on.

This is the management server's logs when the host is being shut down:

http://pastebin.com/4wfV830Z

During the time, I noted that there are quite a lot of "Sending Disconnect
to listener" messages, which implies that the management server try to
notify other listeners that the host is going down. However, subsequently I
didn't see any messages on the logs showing that the management server is
trying to activate the HA capability to start the affected VMs on another
available host.

This is the management server's logs when the host is being turned back on:

http://pastebin.com/JrLJxbXH

When the agent is reconnected, then CloudStack marked the affected VMs as
stopped from previously running:

===
2013-07-24 23:04:57,406 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(AgentConnectTaskPool-7:null) Found 5 VMs for host 34
2013-07-24 23:04:57,408 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(AgentConnectTaskPool-7:null) VM i-2-273-VM: cs state = Running and
realState = Stopped
2013-07-24 23:04:57,408 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(AgentConnectTaskPool-7:null) VM i-2-273-VM: cs state = Running and
realState = Stopped
2013-07-24 23:04:57,408 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
(AgentConnectTaskPool-7:null) VM does not require investigation so I'm
marking it as Stopped: VM[User|Ubuntu-12-04-2-64bit]
2013-07-24 23:04:57,450 DEBUG [cloud.capacity.CapacityManagerImpl]
(AgentConnectTaskPool-7:null) VM state transitted from :Running to Stopping
with event: StopRequestedvm's original host id: 28 new host id: 34 host id
before state transition: 34
===

Then the HA starts to kick in.

===
2013-07-24 23:04:57,955 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-1:work-307) Processing HAWork[307-HA-273-Stopped-Scheduled]
2013-07-24 23:04:57,956 DEBUG [cloud.capacity.CapacityManagerImpl]
(AgentConnectTaskPool-7:null) VM state transitted from :Running to Stopping
with event: StopRequestedvm's original host id: 28 new host id: 34 host id
before state transition: 34
2013-07-24 23:04:57,960 DEBUG [agent.transport.Request]
(AgentConnectTaskPool-7:null) Seq 34-105644038: Sending  { Cmd , MgmtId:
161342671900, via: 34, Ver: v1, Flags: 100111,
[{"StopCommand":{"isProxy":false,"vmName":"i-2-281-VM","wait":0}}] }
2013-07-24 23:04:57,968 INFO  [cloud.ha.HighAvailabilityManagerImpl]
(HA-Worker-1:work-307) HA on VM[User|Ubuntu-12-04-2-64bit]
2013-07-24 23:04:57,984 DEBUG [cloud.capacity.CapacityManagerImpl]
(HA-Worker-1:work-307) VM state transitted from :Stopped to Starting with
event: StartRequestedvm's original host id: 28 new host id: null host id
before state transition: null
2013-07-24 23:04:57,984 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-1:work-307) Successfully transitioned to start state for
VM[User|Ubuntu-12-04-2-64bit] reservation id =
b56364ef-90d8-443f-a348-7660fda48d34
2013-07-24 23:04:58,025 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-1:work-307) Trying to deploy VM, vm has dcId: 6 and podId: 6
2013-07-24 23:04:58,025 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-1:work-307) Deploy avoids pods: null, clusters: null, hosts: null
2013-07-24 23:04:58,031 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-1:work-307) Root volume is ready, need to place VM in volume's
cluster
2013-07-24 23:04:58,031 DEBUG [cloud.vm.VirtualMachineManagerImpl]
(HA-Worker-1:work-307) Vol[295|vm=273|ROOT] is READY, changing deployment
plan to use this pool's dcId: 6 , podId: 6 , and clusterId: 6
===

My question is why HA only kicks in when the host is turned back on? By
right it should kick in soon after the host is shut down and marked as
"Disconnected".

Any insights on the possible solutions to this problem is highly
appreciated.

Looking forward to your reply, thank you.

Cheers.



On Thu, Jul 25, 2013 at 12:00 AM, Indra Pramana <[email protected]> wrote:

> Hi Chip,
>
> Yes, "Offer HA" is set to "Yes" on all my compute offerings.
>
> Hi Geoff,
>
> Yes, I am using KVM. Is this a known issue and is there any solution to
> this problem?
>
> Looking forward to your reply, thank you.
>
> Cheers.
>
>
>
> On Wed, Jul 24, 2013 at 11:38 PM, Geoff Higginbottom <
> [email protected]> wrote:
>
>> Is it running on KVM, we are seeing some real issue with HA simply not
>> working on KVM.
>>
>> Regards
>>
>> Geoff Higginbottom
>>
>> D: +44 20 3603 0542 | S: +44 20 3603 0540 | M: +447968161581
>>
>> [email protected]
>>
>> -----Original Message-----
>> From: Chip Childers [mailto:[email protected]]
>> Sent: 24 July 2013 16:37
>> To: <[email protected]>
>> Subject: Re: HA not working - CloudStack 4.1.0 and KVM hypervisor hosts
>>
>> Did you enable HA for your compute offering?
>>
>> On Jul 24, 2013, at 11:25 AM, Indra Pramana <[email protected]> wrote:
>>
>> > Dear all,
>> >
>> > I tried to shutdown one of my hypervisor hosts to simulate a server
>> > failure, and the HA is not working, all the VMs on the affected host
>> > is not started on another available host.
>> >
>> > I am using CloudStack 4.1.0 with KVM hypervisors and Ceph RBD for
>> > primary storage.
>> >
>> > My issue is similar to what is being described here:
>> >
>> > https://issues.apache.org/jira/browse/CLOUDSTACK-3535
>> >
>> > Except that on my case, the host is indeed marked as "Disconnected"
>> > but there is no attempt from CloudStack to try starting the VMs on
>> > another host. I can't provide logs since there's nothing on the logs
>> > which suggest that CloudStack tries to activate the HA and start the
>> > affected VMs on another host.
>> >
>> > Anyone has similar experience? Anyone knows if the above bug has been
>> > resolved?
>> >
>> > Looking forward to your reply, thank you.
>> >
>> > Cheers.
>> This email and any attachments to it may be confidential and are intended
>> solely for the use of the individual to whom it is addressed. Any views or
>> opinions expressed are solely those of the author and do not necessarily
>> represent those of Shape Blue Ltd or related companies. If you are not the
>> intended recipient of this email, you must neither take any action based
>> upon its contents, nor copy or show it to anyone. Please contact the sender
>> if you believe you have received this email in error. Shape Blue Ltd is a
>> company incorporated in England & Wales. ShapeBlue Services India LLP is
>> operated under license from Shape Blue Ltd. ShapeBlue is a registered
>> trademark.
>>
>
>

Reply via email to