Sheng Yang created CLOUDSTACK-689: ------------------------------------- Summary: RVR: Stop pending flag is not cleared when user start the disconnected router from another host Key: CLOUDSTACK-689 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-689 Project: CloudStack Issue Type: Bug Security Level: Public (Anyone can view this level - this is the default.) Components: Network Controller Affects Versions: 4.0.0 Reporter: Sheng Yang Assignee: Sheng Yang Fix For: 4.1.0
Here is what's happened: 2012-09-16 21:27:24,285 WARN [network.router.VirtualNetworkApplianceManagerImpl] (HA-Worker-0:work-37779) Unable to apply dhcp entry on disconnected router r-38445-VM com.cloud.exception.AgentUnavailableException: Resource [Host:610] is unreachable: Host 610: Host is not in the right state: Down [...] 2012-09-16 21:27:24,286 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (HA-Worker-0:work-37779) About to stop the router r-38445-VM due to: Unable to apply dhcp entry on disconnected router At the above time, we set stop pending flag, due to host 610 is down, and we failed to reach the router r-38445-VM. Then: 2012-09-16 21:51:54,067 DEBUG [cloud.async.AsyncJobManagerImpl] (catalina-exec-28:null) submit async job-296045, details: AsyncJobVO {id:296045, userId: 6217, accountId: 6178, sessionKey: null, instanceType: DomainRouter, instanceId: 38445, cmd: com.cloud.api.commands.StartRouterCmd, cmdOriginator: null, cmdInfo: {"response":"json","id":"f14313a7-5490-408f-a9d7-292303784f8f","sessionkey":"lByhfEpE7fygtOASvMH3i8xgEok\u003d","ctxUserId":"6217","_":"1347799961307","ctxAccountId":"6178","ctxStartEventId":"1533940"}, cmdVersion: 0, callbackType: 0, callbackAddress: null, status: 0, processStatus: 0, resultCode: 0, result: null, initMsid: 152687106451262, completeMsid: null, lastUpdated: null, lastPolled: null, created: null} 2012-09-16 21:51:54,070 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-126:job-296045) Executing com.cloud.api.commands.StartRouterCmd for job-296045 User start the router manually, and CS deploy it on host 606: 2012-09-16 21:51:56,069 DEBUG [cloud.vm.VirtualMachineManagerImpl] (Job-Executor-126:job-296045) Deployment found - P0=VM[DomainRouter|r-38445-VM], P0=Dest[Zone(Id)-Pod(Id)-Cluster(Id)-Host(Id)-Storage(Volume(Id|Type-->Pool(Id))] : Dest[Zone( 2)-Pod(29)-Cluster(52)-Host(606)-Storage()] 2012-09-16 21:51:56,090 DEBUG [cloud.capacity.CapacityManagerImpl] (Job-Executor-126:job-296045) VM state transitted from :Starting to Starting with event: OperationRetryvm's original host id: 610 new host id: 606 host id before state transiti on: null We didn't cover this situation in our code. And this won't clear stop pending flag of course. Also the stop pending flag can be set if the command sent to router is time out, but host is not detected as Disconnected by CS. See the mgmt log attached for r-5-VM: 2012-09-29 22:20:55,413 DEBUG [agent.manager.AgentAttache] (Job-Executor-130:job-450) Seq 1-1662189781: Cancelling. 2012-09-29 22:20:55,413 WARN [network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-130:job-450) Timed Out com.cloud.exception.OperationTimedoutException: Commands 1662189781 to Host 1 timed out after 3600 at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:425) at com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:501) at com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:1022) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.sendCommandsToRouter(VirtualNetworkApplianceManagerImpl.java:2590) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.sendPortForwardingRules(VirtualNetworkApplianceManagerImpl.java:2725) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl$5.execute(VirtualNetworkApplianceManagerImpl.java:2703) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyRules(VirtualNetworkApplianceManagerImpl.java:2809) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyFirewallRules(VirtualNetworkApplianceManagerImpl.java:2688) at com.cloud.network.element.VirtualRouterElement.applyPFRules(VirtualRouterElement.java:636) at com.cloud.network.NetworkManagerImpl.applyRules(NetworkManagerImpl.java:3245) at com.cloud.network.firewall.FirewallManagerImpl.applyRules(FirewallManagerImpl.java:374) at com.cloud.network.rules.RulesManagerImpl.applyPortForwardingRulesForNetwork(RulesManagerImpl.java:714) at com.cloud.network.NetworkManagerImpl.reprogramNetworkRules(NetworkManagerImpl.java:3516) at com.cloud.network.NetworkManagerImpl.implementNetworkElementsAndResources(NetworkManagerImpl.java:1812) at com.cloud.network.NetworkManagerImpl.implementNetwork(NetworkManagerImpl.java:1754) at com.cloud.utils.db.DatabaseCallback.intercept(DatabaseCallback.java:30) at com.cloud.network.NetworkManagerImpl.startNetwork(NetworkManagerImpl.java:3436) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouter(VirtualNetworkApplianceManagerImpl.java:2285) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouter(VirtualNetworkApplianceManagerImpl.java:2255) at com.cloud.event.ActionEventCallback.intercept(ActionEventCallback.java:32) at com.cloud.api.commands.StartRouterCmd.execute(StartRouterCmd.java:101) at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:134) at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:427) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) 2012-09-29 22:20:55,414 WARN [network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-130:job-450) Unable to apply firewall rules on disconnected router r-5-VM com.cloud.exception.AgentUnavailableException: Resource [Host:1] is unreachable: Host 1: Unable to send commands to virtual router at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.sendCommandsToRouter(VirtualNetworkApplianceManagerImpl.java:2593) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.sendPortForwardingRules(VirtualNetworkApplianceManagerImpl.java:2725) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl$5.execute(VirtualNetworkApplianceManagerImpl.java:2703) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyRules(VirtualNetworkApplianceManagerImpl.java:2809) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.applyFirewallRules(VirtualNetworkApplianceManagerImpl.java:2688) at com.cloud.network.element.VirtualRouterElement.applyPFRules(VirtualRouterElement.java:636) at com.cloud.network.NetworkManagerImpl.applyRules(NetworkManagerImpl.java:3245) at com.cloud.network.firewall.FirewallManagerImpl.applyRules(FirewallManagerImpl.java:374) at com.cloud.network.rules.RulesManagerImpl.applyPortForwardingRulesForNetwork(RulesManagerImpl.java:714) at com.cloud.network.NetworkManagerImpl.reprogramNetworkRules(NetworkManagerImpl.java:3516) at com.cloud.network.NetworkManagerImpl.implementNetworkElementsAndResources(NetworkManagerImpl.java:1812) at com.cloud.network.NetworkManagerImpl.implementNetwork(NetworkManagerImpl.java:1754) at com.cloud.utils.db.DatabaseCallback.intercept(DatabaseCallback.java:30) at com.cloud.network.NetworkManagerImpl.startNetwork(NetworkManagerImpl.java:3436) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouter(VirtualNetworkApplianceManagerImpl.java:2285) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.startRouter(VirtualNetworkApplianceManagerImpl.java:2255) at com.cloud.event.ActionEventCallback.intercept(ActionEventCallback.java:32) at com.cloud.api.commands.StartRouterCmd.execute(StartRouterCmd.java:101) at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:134) at com.cloud.async.AsyncJobManagerImpl$1.run(AsyncJobManagerImpl.java:427) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: com.cloud.exception.OperationTimedoutException: Commands 1662189781 to Host 1 timed out after 3600 at com.cloud.agent.manager.AgentAttache.send(AgentAttache.java:425) at com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:501) at com.cloud.agent.manager.AgentManagerImpl.send(AgentManagerImpl.java:1022) at com.cloud.network.router.VirtualNetworkApplianceManagerImpl.sendCommandsToRouter(VirtualNetworkApplianceManagerImpl.java:2590) ... 33 more 2012-09-29 22:20:55,415 DEBUG [network.router.VirtualNetworkApplianceManagerImpl] (Job-Executor-130:job-450) About to stop the router r-5-VM due to: Unable to apply firewall rules on disconnected router -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira