Bug ID: CLOUDSTACK-3535

https://issues.apache.org/jira/browse/CLOUDSTACK-3535


Regards,

Paul Angus
S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus
paul.an...@shapeblue.com

-----Original Message-----
From: Joe Brockmeier [mailto:j...@zonker.net]
Sent: 15 July 2013 15:32
To: dev@cloudstack.apache.org
Subject: Re: [URGENT] KVM HA - (FW: cs 4.1 host disconnected status)

Hi Paul,

What's the bug ID for this so we can track it properly?

Thanks!

Joe

On Mon, Jul 15, 2013, at 02:31 AM, Paul Angus wrote:
> I bumped this from the user list as we've just come across the same
> issue.
>
> CloudStack does not react or even change host status when contact is
> lost with a KVM host.
>
> 2013-07-13 17:53:56,695 DEBUG [cloud.ha.AbstractInvestigatorImpl]
> (AgentTaskPool-1:null) host (10.0.100.51) cannot be pinged, returning
> null ('I don't know')
> 2013-07-13 17:53:56,695 DEBUG [cloud.ha.UserVmDomRInvestigator]
> (AgentTaskPool-1:null) could not reach agent, could not reach agent's
> host, returning that we don't have enough information
> 2013-07-13 17:53:56,695 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
> (AgentTaskPool-1:null) null unable to determine the state of the host.
> Moving on.
> 2013-07-13 17:53:56,695 DEBUG [cloud.ha.HighAvailabilityManagerImpl]
> (AgentTaskPool-1:null) null unable to determine the state of the host.
> Moving on.
> 2013-07-13 17:53:56,695 WARN  [agent.manager.AgentManagerImpl]
> (AgentTaskPool-1:null) Agent state cannot be determined, do nothing
>
> HA for KVM is almost useless.
>
> I suggest this a blocker for any release until fixed.
>
>
> Regards,
>
> Paul Angus
> S: +44 20 3603 0540 | M: +447711418784 | T: CloudyAngus
> paul.an...@shapeblue.com
>
> -----Original Message-----
> From: Koushik Das [mailto:koushik....@citrix.com]
> Sent: 12 July 2013 12:21
> To: us...@cloudstack.apache.org
> Subject: RE: cs 4.1 host disconnected status
>
> I looked at the logs and none of the existing investigators are able
> to determine that the host is down. I am not sure if there is a clean
> way to identify if a host is down in case of KVM. Consider the following 
> cases:
>
> 1. Host is actually shutdown
> 2. Management nic of the host is plugged out of the network but host
> is up and running
>
> There is no clean way to distinguish these cases. Cloudstack should
> only mark the host as down in the first case. But not sure how one
> would achieve this.
>
> -Koushik
>
> > -----Original Message-----
> > From: Valery Ciareszka [mailto:valery.teres...@gmail.com]
> > Sent: Friday, July 12, 2013 2:39 PM
> > To: us...@cloudstack.apache.org
> > Subject: Re: cs 4.1 host disconnected status
> >
> > I've simulated crash again and here is the log:
> > http://thesuki.org/temp/cs.log.txt
> > I stripped out of there GET requests with api keys.
> > Server was switched off at 8:36
> >
> > On Fri, Jul 12, 2013 at 11:17 AM, Koushik Das <koushik....@citrix.com>wrote:
> >
> > > Looks like the KVM investigator is not able to determine the state
> > > of the agent. Can you share the full log?
> > >
> > > > -----Original Message-----
> > > > From: Valery Ciareszka [mailto:valery.teres...@gmail.com]
> > > > Sent: Thursday, July 11, 2013 7:47 PM
> > > > To: users
> > > > Subject: cs 4.1 host disconnected status
> > > >
> > > > Hi all.
> > > >
> > > > I use the following environment: CS 4.1, KVM, Centos 6.4
> > > > (management+node1+node2), OpenIndiana NFS server as primary and
> > > > secondary storage.
> > > > and I have the following problem:
> > > > If I switch one hypervisor node off via ipmi (simulate server
> > > > crash), it
> > > never
> > > > goes to Disconnected status in management. Accordingly,
> > > > ha-enabled VMs are not restarted on another hypervisor node,
> > > > because it believes that disconnected node is still online.
> > > >
> > > >
> > > > I get following in management server logs:
> > > >
> > > > 2013-07-11 10:19:16,153 DEBUG [agent.transport.Request]
> > > > (AgentManager-Handler-13:null) Seq 19-1133189098:             
> > > > Processing:
> > > >  { Ans: , MgmtId: 161603152803976, via: 19, Ver: v1, Flags: 10,
> > > > [{"Answer":{"result":false,"details":     "Unable to ping computing 
> > > > host,
> > > > exiting","wait":0}}] }
> > > > 2013-07-11 10:19:16,153 DEBUG [agent.transport.Request]
> > > > (AgentTaskPool-1:null) Seq 19-1133189098: Received:  { Ans: , MgmtId:
> > > > 161603152803976, via: 19, Ver: v1, Flags: 10, { Answer } }
> > > > 2013-07-11 10:19:16,153 DEBUG
> > > > [cloud.ha.AbstractInvestigatorImpl]
> > > > (AgentTaskPool-1:null) host (172.16.20.241) cannot  be pinged,
> > > > returning
> > > null
> > > > ('I don't know')
> > > > 2013-07-11 10:19:16,153 DEBUG [cloud.ha.UserVmDomRInvestigator]
> > > > (AgentTaskPool-1:null) could not reach agent, could   not reach agent's
> > > > host, returning that we don't have enough information
> > > > 2013-07-11 10:19:16,153 DEBUG
> > > > [cloud.ha.HighAvailabilityManagerImpl]
> > > > (AgentTaskPool-1:null) null unable to determine  the state of the host.
> > > >  Moving on.
> > > > 2013-07-11 10:19:16,153 DEBUG
> > > > [cloud.ha.HighAvailabilityManagerImpl]
> > > > (AgentTaskPool-1:null) null unable to determine  the state of the host.
> > > >  Moving on.
> > > > 2013-07-11 10:19:16,153 WARN  [agent.manager.AgentManagerImpl]
> > > > (AgentTaskPool-1:null) Agent state cannot be           determined, do
> > > > nothing
> > > >
> > > >
> > > > If I power on dead node, it goes to state "Connecting" and then "Up"
> > > > in management interface.
> > > >
> > > > 2013-07-11 13:57:24,311 DEBUG [cloud.host.Status]
> > > > (Thread-6:null) Ping timeout for host 12, do invstigation
> > > > 2013-07-11 13:58:24,315 DEBUG [cloud.host.Status]
> > > > (Thread-6:null) Ping timeout for host 12, do invstigation
> > > > 2013-07-11 13:59:24,320 DEBUG [cloud.host.Status]
> > > > (Thread-6:null) Ping timeout for host 12, do invstigation
> > > > 2013-07-11 13:59:57,239 DEBUG [cloud.host.Status]
> > > > (AgentConnectTaskPool-5:null) Transition:[Resource state =
> > > > Enabled, Agent event = AgentConnected, Host id = 12, name =
> > > > ad112.colobridge.net]
> > > > 2013-07-11 13:59:57,264 DEBUG [cloud.host.Status]
> > > > (AgentConnectTaskPool-5:null) Agent status update: [id = 12;
> > > > name = ad112.colobridge.net; old status = Up; event =
> > > > AgentConnected; new
> > > status
> > > > = Connecting; old update count = 1285; new update count = 1286]
> > > > 2013-07-11 14:00:50,611 DEBUG [cloud.host.Status]
> > > > (AgentConnectTaskPool-5:null) Transition:[Resource state =
> > > > Enabled, Agent event = Ready, Host id = 12, name =
> > > > ad112.colobridge.net]
> > > > 2013-07-11 14:00:50,633 DEBUG [cloud.host.Status]
> > > > (AgentConnectTaskPool-5:null) Agent status update: [id = 12;
> > > > name = ad112.colobridge.net; old status = Connecting; event =
> > > > Ready; new
> > > status =
> > > > Up; old update count = 1286; new update count = 1287]
> > > >
> > > >
> > > > If I restart cloud-management service, dead node goes to state
> > > > "Disconnected" in management interface.
> > > > (there is nothing special in logs in this case)
> > > >
> > > > If I do nothing,  dead node could stay in "Up" state forever (I
> > > > waited
> > > for
> > > > 12 hours) in management interface, throwing into logs "Agent
> > > > state cannot be determined, do nothing"
> > > >
> > > > Would appreciate if someone could help/suggest how to deal with
> > > > this problem.
> > > >
> > > > --
> > > > Regards,
> > > > Valery
> > > >
> > > > http://protocol.by/slayer
> > >
> >
> >
> >
> > --
> > Regards,
> > Valery
> >
> > http://protocol.by/slayer
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is addressed.
> Any views or opinions expressed are solely those of the author and do
> not necessarily represent those of Shape Blue Ltd or related
> companies. If you are not the intended recipient of this email, you
> must neither take any action based upon its contents, nor copy or show
> it to anyone. Please contact the sender if you believe you have received this 
> email in error.
> Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue
> Services India LLP is operated under license from Shape Blue Ltd.
> ShapeBlue is a registered trademark.


Best,

jzb
--
Joe Brockmeier
j...@zonker.net
Twitter: @jzb
http://www.dissociatedpress.net/

This email and any attachments to it may be confidential and are intended 
solely for the use of the individual to whom it is addressed. Any views or 
opinions expressed are solely those of the author and do not necessarily 
represent those of Shape Blue Ltd or related companies. If you are not the 
intended recipient of this email, you must neither take any action based upon 
its contents, nor copy or show it to anyone. Please contact the sender if you 
believe you have received this email in error. Shape Blue Ltd is a company 
incorporated in England & Wales. ShapeBlue Services India LLP is operated under 
license from Shape Blue Ltd. ShapeBlue is a registered trademark.

Reply via email to