Can you observe the status of SSVM (is it UP/Connecting/Disconnected/Down) while you have issues?
I would advise checking your Secondary Storage itself - and also running the SSVM diagnose script /usr/local/cloud/systemvm/ssvm-check.sh - observe if any errors with NFS or others. Lastly - and don't laugh - check that you don't have issues with networking equipment (some of us had VEEEERY strange issues in connectivity some years ago with crappy QCT/Quanta Switches in MLAG setup) Andrija On Thu, 25 Jul 2019 at 15:42, Rakesh v <www.rakeshv....@gmail.com> wrote: > Yes I have set the ip's of the three MGT servers in the "host" field > > Sent from my iPhone > > > On 25-Jul-2019, at 2:14 PM, Pierre-Luc Dion <pdion...@apache.org> wrote: > > > > Do you have a load balancer in front of cloudstack? Did you set the > global > > settings "host" to the ip of the mgmt server? > > > > > > Le jeu. 25 juill. 2019 03 h 24, Rakesh Venkatesh < > www.rakeshv....@gmail.com> > > a écrit : > > > >> Hello People > >> > >> > >> I have a strange issue where mgt server times out to send a command to > >> secondary storage VM every hour and because of this UI won't be > accessible > >> for a short duration of time. Sometimes I have to restart mgt server to > get > >> it back to working state and sometimes I don't need to restart it. I > also > >> see some exceptions while fetching the storage stats. > >> > >> > >> The log says secondary storage VM is lagging behind mgt server in ping > and > >> it sends a disconnect message to other components. Can you let me know > how > >> to troubleshoot this issue? I destroyed the secondary storage VM but the > >> issue still persists. I checked the date/time on the mgt server and SSVM > >> and they are same. This is happening for quite a few days now. Below are > >> the logs > >> > >> > >> > >> 2019-07-25 04:01:22,769 INFO [c.c.a.m.AgentManagerImpl] > >> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Found the following > agents > >> behind on ping: [183] > >> 2019-07-25 04:01:22,775 WARN [c.c.a.m.AgentManagerImpl] > >> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Disconnect agent for > >> CPVM/SSVM due to physical connection close. host: 183 > >> 2019-07-25 04:01:22,778 INFO [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Host 183 is > disconnecting > >> with event ShutdownRequested > >> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) The next status of agent > >> 183is Disconnected, current status is Up > >> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Deregistering link for > 183 > >> with state Disconnected > >> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Remove Agent : 183 > >> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.ConnectedAgentAttache] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Processing Disconnect. > >> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Seq > >> 183-7541559051008607242: Sending disconnect to class > >> com.cloud.agent.manager.SynchronousListener > >> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer > >> 2019-07-25 04:01:22,782 DEBUG [c.c.u.n.NioConnection] > >> (pool-2-thread-1:null) (logid:) Closing socket Socket[addr=/ > 172.30.32.16 > >> ,port=38250,localport=8250] > >> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache] > >> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq > >> 183-7541559051008607242: Waiting some more time because this is the > current > >> command > >> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer > >> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentAttache] > >> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq > >> 183-7541559051008607242: Waiting some more time because this is the > current > >> command > >> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.deploy.DeploymentPlanningManagerImpl > >> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.network.security.SecurityGroupListener > >> 2019-07-25 04:01:22,783 INFO [c.c.u.e.CSExceptionErrorCode] > >> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Could not find > exception: > >> com.cloud.exception.OperationTimedoutException in error code list for > >> exceptions > >> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: org.apache.cloudstack.engine.orchestration.NetworkOrchestrator > >> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl > >> 2019-07-25 04:01:22,783 WARN [c.c.a.m.AgentAttache] > >> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq > >> 183-7541559051008607242: Timed out on null > >> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.storage.listener.StoragePoolMonitor > >> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentAttache] > >> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq > >> 183-7541559051008607242: Cancelling. > >> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.storage.secondary.SecondaryStorageListener > >> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.network.SshKeysDistriMonitor > >> 2019-07-25 04:01:22,785 DEBUG [o.a.c.s.RemoteHostEndPoint] > >> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Failed to send command, > >> due to Agent:183, com.cloud.exception.OperationTimedoutException: > Commands > >> 7541559051008607242 to Host 183 timed out after 3600 > >> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl > >> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.storage.download.DownloadListener > >> > >> > >> > >> > >> 2019-07-25 04:01:22,785 ERROR [c.c.s.StatsCollector] > >> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Error trying to > retrieve > >> storage stats > >> com.cloud.utils.exception.CloudRuntimeException: Failed to send command, > >> due to Agent:183, com.cloud.exception.OperationTimedoutException: > Commands > >> 7541559051008607242 to Host 183 timed out after 3600 > >> at > >> > >> > org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133) > >> at > >> > >> > com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:1139) > >> at > >> > >> > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) > >> at > >> > >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > >> at > >> > >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > >> at > >> > >> > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > >> at > >> > >> > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) > >> at > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > >> at > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > >> at > >> > >> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > >> at > >> > >> > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > >> at > >> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > >> at > >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > >> at java.lang.Thread.run(Thread.java:748) > >> 2019-07-25 04:01:22,786 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.consoleproxy.ConsoleProxyListener > >> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.storage.LocalStoragePoolListener > >> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.storage.upload.UploadListener > >> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.capacity.StorageCapacityListener > >> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.capacity.ComputeCapacityListener > >> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.network.SshKeysDistriMonitor > >> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.network.router.VirtualNetworkApplianceManagerImpl > >> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: > >> com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener > >> 2019-07-25 04:01:22,791 DEBUG [c.c.n.NetworkUsageManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Disconnected called on > 183 > >> with status Disconnected > >> > >> > >> > >> > >> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener > >> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl] > >> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect to > >> listener: com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener > >> 2019-07-25 04:01:22,791 DEBUG [c.c.h.Status] > (AgentTaskPool-1:ctx-66de2057) > >> (logid:841d2a63) Transition:[Resource state = Enabled, Agent event = > >> ShutdownRequested, Host id = 183, name = s-2775-VM] > >> > >> > >> > >> -- > >> Thanks and regards > >> Rakesh venkatesh > >> > -- Andrija Panić