Sowmya Krishnan created CLOUDSTACK-3938: -------------------------------------------
Summary: Operation Timed out and Resource unreachable exceptions in clustered management server setup Key: CLOUDSTACK-3938 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3938 Project: CloudStack Issue Type: Bug Security Level: Public (Anyone can view this level - this is the default.) Components: Management Server Affects Versions: 4.2.0 Environment: 4.2, Using simulator to setup a load test env Reporter: Sowmya Krishnan Priority: Blocker Fix For: 4.2.0 Set up: 2 Management servers, and mysql DB running in a remote server Using simulated hosts and resources. Deployed advanced zone with RVR, and setting up simulator hosts and storage pools. After deploying few of them, getting the following exceptions constantly: 2013-07-30 02:34:02,468 DEBUG [agent.transport.Request] (StatsCollector-2:null) Seq 2-197787660: Received: { Ans: , MgmtId: 206915885097283, via: 2, Ver: v1 , Flags: 10, { GetStorageStatsAnswer } } 2013-07-30 02:34:02,472 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-35:null) Seq 1-1644953610: Response Received: 2013-07-30 02:34:02,472 DEBUG [agent.transport.Request] (StatsCollector-3:null) Seq 1-1644953610: Received: { Ans: , MgmtId: 206915885097283, via: 1, Ver: v1, Flags: 10, { GetHostStatsAnswer } } 2013-07-30 02:34:02,476 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 3-2136014853: Forwarding null to 206915885094132 2013-07-30 02:34:02,478 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-15:null) Seq 3-2136014853: Routing from 206915885097283 2013-07-30 02:34:02,478 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-15:null) Seq 3-2136014853: Link is closed 2013-07-30 02:34:02,478 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-15:null) Seq 3-2136014853: MgmtId 206915885097283: Req: Resource [Host:3] is unreachable: Host 3: Link is closed 2013-07-30 02:34:02,479 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-15:null) Seq 3--1: MgmtId 206915885097283: Req: Routing to peer 2013-07-30 02:34:02,481 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-16:null) Seq 3--1: MgmtId 206915885097283: Req: Cancel request received 2013-07-30 02:34:02,481 DEBUG [agent.manager.AgentAttache] (AgentManager-Handler-16:null) Seq 3-2136014853: Cancelling. 2013-07-30 02:34:02,481 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-2136014853: Waiting some more time because this is the current command 2013-07-30 02:34:02,481 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-2136014853: Waiting some more time because this is the current command 2013-07-30 02:34:02,481 INFO [utils.exception.CSExceptionErrorCode] (StatsCollector-2:null) Could not find exception: com.cloud.exception.OperationTimedoutException in error code list for exceptions 2013-07-30 02:34:02,482 WARN [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-2136014853: Timed out on null 2013-07-30 02:34:02,482 DEBUG [agent.manager.AgentAttache] (StatsCollector-2:null) Seq 3-2136014853: Cancelling. 2013-07-30 02:34:02,482 DEBUG [cloud.storage.StorageManagerImpl] (StatsCollector-2:null) Unable to send storage pool command to Pool[2|NetworkFilesystem] via 3 com.cloud.exception.OperationTimedoutException: Commands 2136014853 to Host 3 timed out after 3600 at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:430) at com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:486) at com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:439) at com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:977) at com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:428) at com.cloud.storage.StorageManagerImpl.sendToPool(StorageManagerImpl.java:442) at com.cloud.server.StatsCollector$StorageCollector.run(StatsCollector.java:562) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) 2013-07-30 02:34:02,483 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-34:null) Seq 2-197787661: Executing request 2013-07-30 02:34:02,488 DEBUG [agent.manager.ClusteredAgentAttache] (StatsCollector-2:null) Seq 4-1100414979: Forwarding null to 206915885094132 2013-07-30 02:34:02,489 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-34:null) Seq 2-197787661: Response Received: 2013-07-30 02:34:02,489 DEBUG [agent.transport.Request] (StatsCollector-3:null) Seq 2-197787661: Received: { Ans: , MgmtId: 206915885097283, via: 2, Ver: v1, Flags: 10, { GetHostStatsAnswer } } 2013-07-30 02:34:02,490 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-17:null) Seq 4-1100414979: Routing from 206915885097283 2013-07-30 02:34:02,490 DEBUG [agent.manager.ClusteredAgentAttache] (AgentManager-Handler-17:null) Seq 4-1100414979: Link is closed 2013-07-30 02:34:02,490 DEBUG [agent.manager.ClusteredAgentManagerImpl] (AgentManager-Handler-17:null) Seq 4-1100414979: MgmtId 206915885097283: Req: Resource [Host:4] is unreachable: Host 4: Link is closed -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira