Do VRs stay online and connected? What you need to do next is check your cloud.log on the system VMs, possibly also up the verbosity level in the logs to catch why they are dropping comms.
Regards, Dag Sonstebo Cloud Architect ShapeBlue On 23/02/2018, 15:25, "Chen Zhang" <[email protected]> wrote: Hi Dag, Yes I did recreate the new system VMs. The version is "Cloudstack release 4.11.0". Thanks! Chen On Fri, Feb 23, 2018 at 9:27 AM, Dag Sonstebo <[email protected]> wrote: > Hi Chen, > > You say you just upgraded to 4.11 – did you destroy your system VMs and > let them recreate after the upgrade? > > Can you also check what version a “cat /etc/cloudstack-release” shows up > with on your SSVM/CPVM? > > Regards, > Dag Sonstebo > Cloud Architect > ShapeBlue > > On 23/02/2018, 14:00, "Chen Zhang" <[email protected]> wrote: > > Hello, > > > I am new in the list and I am stuck with a very annoying issue on > CPVM/SSVM. > > > When I start the Cloudstack-management, everything is good. After > around 3-4 > <outlook-data-detector://0> hours, the agent state of CPVM/SSVM > automatically turns to "Disconnected" and the secondary storage goes to > "0kb/0kb", but the VM state is still "running". Once manually rebooting > CPVM/SSVM, the agent state would turn back to "up" and the secondary > storage would be back as well. After 3-4 hours, the issue repeats > again. > > > Here is the log when SSVM/CPVM goes down: > > > ---- > 2018-02-21 15:57:47,517 INFO [c.c.a.m.AgentManagerImpl] > (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Found the following > agents > behind on ping: [3] > 2018-02-21 15:57:47,521 WARN [c.c.a.m.AgentManagerImpl] > (AgentMonitor-1:ctx-81471e1e) (logid:d0bdac05) Disconnect agent for > CPVM/SSVM due to physical connection close. host: 3 > 2018-02-21 15:57:47,522 INFO [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Host 3 is disconnecting > with event ShutdownRequested > 2018-02-21 15:57:47,524 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) The next status of > agent > 3is Disconnected, current status is Up > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Deregistering link for > 3 > with state Disconnected > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Remove Agent : 3 > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.ConnectedAgentAttache] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Processing Disconnect. > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Seq > 3-906630899985023222: > Sending disconnect to class com.cloud.agent.manager. > SynchronousListener > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.hypervisor.xenserver.discoverer. > XcpServerDiscoverer > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.hypervisor.hyperv.discoverer. > HypervServerDiscoverer > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.storage.listener.StoragePoolMonitor > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: org.apache.cloudstack.engine.orchestration. > NetworkOrchestrator > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.storage.secondary.SecondaryStorageListener > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.network.security.SecurityGroupListener > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache] > (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq > 3-906630899985023222: > Waiting some more time because this is the current command > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.deploy.DeploymentPlanningManagerImpl > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.network.SshKeysDistriMonitor > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.consoleproxy.ConsoleProxyListener > 2018-02-21 15:57:47,525 DEBUG [c.c.a.m.AgentAttache] > (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq > 3-906630899985023222: > Waiting some more time because this is the current command > 2018-02-21 15:57:47,526 INFO [c.c.u.e.CSExceptionErrorCode] > (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Could not find > exception: > com.cloud.exception.OperationTimedoutException in error code list for > exceptions > 2018-02-21 15:57:47,526 WARN [c.c.a.m.AgentAttache] > (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq > 3-906630899985023222: > Timed out on null > 2018-02-21 15:57:47,526 DEBUG [c.c.a.m.AgentAttache] > (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Seq > 3-906630899985023222: > Cancelling. > 2018-02-21 15:57:47,526 DEBUG [o.a.c.s.RemoteHostEndPoint] > (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Failed to send > command, > due to Agent:3, com.cloud.exception.OperationTimedoutException: > Commands > 906630899985023222 to Host 3 timed out after 3600 > 2018-02-21 15:57:47,526 ERROR [c.c.s.StatsCollector] > (StatsCollector-4:ctx-410838d0) (logid:4efe8dd2) Error trying to > retrieve > storage stats > com.cloud.utils.exception.CloudRuntimeException: Failed to send > command, > due to Agent:3, com.cloud.exception.OperationTimedoutException: > Commands > 906630899985023222 to Host 3 timed out after 3600 > at > org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage( > RemoteHostEndPoint.java:133) > at > com.cloud.server.StatsCollector$StorageCollector.runInContext( > StatsCollector.java:985) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run( > ManagedContextRunnable.java:49) > at > org.apache.cloudstack.managed.context.impl. > DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext. > callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext. > runWithContext(DefaultManagedContext.java:53) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable.run( > ManagedContextRunnable.java:46) > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ > ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: > com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener > 2018-02-21 15:57:47,527 DEBUG [c.c.n.NetworkUsageManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Disconnected called on > 3 > with status Disconnected > 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.agent.manager.AgentManagerImpl$ > BehindOnPingListener > 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.agent.manager.AgentManagerImpl$ > SetHostParamsListener > 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.capacity.StorageCapacityListener > 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.capacity.ComputeCapacityListener > 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.network.SshKeysDistriMonitor > 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.network.router.VpcVirtualNetworkApplianceMana > gerImpl > 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.storage.LocalStoragePoolListener > 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.storage.upload.UploadListener > 2018-02-21 15:57:47,527 DEBUG [c.c.a.m.AgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Sending Disconnect to > listener: com.cloud.storage.download.DownloadListener > 2018-02-21 15:57:47,527 DEBUG [c.c.h.Status] > (AgentTaskPool-7:ctx-67ec16e3) > (logid:d6a36e24) Transition:[Resource state = Enabled, Agent event = > ShutdownRequested, Host id = 3, name = s-1-VM] > 2018-02-21 15:57:47,620 DEBUG [c.c.a.m.ClusteredAgentManagerImpl] > (AgentTaskPool-7:ctx-67ec16e3) (logid:d6a36e24) Notifying other nodes > of to > disconnect > ---- > > When the issue arises, all instances, hosts, and other resources are > running fine. I just updated the cloudstack-management and > cloudstack-agent > to to 4.11, but the problem is still there. Any ideas? > > > Thanks! > > Chen > > > > [email protected] > www.shapeblue.com > 53 Chandos Place, Covent Garden, London WC2N 4HSUK > @shapeblue > > > > [email protected] www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue
