Irrelevant if mgmt sends (TCP) disconnect to the SSVM (which shoul not
affect regular OS pings from mgmt to SSVM) -  if you have issues pinging
from mgmt to hypervisor node - than again I would bet on misbehaving switch
equipment.

On Thu, 25 Jul 2019 at 18:52, Rakesh v <www.rakeshv....@gmail.com> wrote:

> True, but I thought since mgt sends disconnect to ssvm, I thought it will
> reset the interface on which is connecting
>
> Sent from my iPhone
>
> > On 25-Jul-2019, at 4:57 PM, Andrija Panic <andrija.pa...@gmail.com>
> wrote:
> >
> > In your previous mail, I understond that you used the OS tool "ping" and
> > NOT refering to internal ACS pings?
> > "Out of all these, the ping drops were observed from MGT server to ssvm
> and
> > mgt server to nodes. Basically all nodes lost connection. Then it
> recovered
> > itself after 1 minute."
> >
> >
> >> On Thu, 25 Jul 2019 at 16:55, Rakesh v <www.rakeshv....@gmail.com>
> wrote:
> >>
> >> The ping between mgt server and ssvm fails because mgt sends disconnect
> >> message to all nodes. If you look at the logs I pasted in first email,
> the
> >> mgt server thinks ssvm is lagging behind on ping and sends a disconnect
> >> message without investigation for all nodes. Also it happens at the
> >> beginning of every hour.
> >>
> >>
> >> So I'm sure network is not the issue here.
> >>
> >> Sent from my iPhone
> >>
> >>> On 25-Jul-2019, at 4:46 PM, Andrija Panic <andrija.pa...@gmail.com>
> >> wrote:
> >>>
> >>> since basic network connectivity (ping failures) was down between mgmts
> >> and
> >>> nodes (and SSVM on it)  - I would point my finger to your networking
> >>> equipment - i.e. I expect zero problems with ACS (since pings fail).
> >>>
> >>> Let us know how it goes.
> >>>
> >>> Andrija
> >>>
> >>>> On Thu, 25 Jul 2019 at 16:04, Rakesh v <www.rakeshv....@gmail.com>
> >> wrote:
> >>>>
> >>>> Yes I was monitoring it continuously. Below are the steps which I was
> >>>> doing when issue happened
> >>>>
> >>>>
> >>>> 1. Ping from MGT server to ssvm
> >>>> 2. Ping from ssvm to secondary storage ip
> >>>> 3. Ping from ssvm to public IP like 8.8.8.8
> >>>> 4. Ping from MGT server to node in which ssvm was running
> >>>>
> >>>>
> >>>> Out of all these, the ping drops were observed from MGT server to ssvm
> >> and
> >>>> mgt server to nodes. Basically all nodes lost connection. Then it
> >> recovered
> >>>> itself after 1 minute.
> >>>>
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>>> On 25-Jul-2019, at 3:48 PM, Andrija Panic <andrija.pa...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> Can you observe the status of SSVM (is it
> >>>> UP/Connecting/Disconnected/Down)
> >>>>> while you have issues?
> >>>>>
> >>>>> I would advise checking your Secondary Storage itself - and also
> >> running
> >>>>> the SSVM diagnose script  /usr/local/cloud/systemvm/ssvm-check.sh -
> >>>> observe
> >>>>> if any errors with NFS or others.
> >>>>>
> >>>>> Lastly - and don't laugh - check that you don't have issues with
> >>>> networking
> >>>>> equipment (some of us had VEEEERY strange issues in connectivity some
> >>>> years
> >>>>> ago with crappy QCT/Quanta Switches in MLAG setup)
> >>>>>
> >>>>> Andrija
> >>>>>
> >>>>>> On Thu, 25 Jul 2019 at 15:42, Rakesh v <www.rakeshv....@gmail.com>
> >>>> wrote:
> >>>>>>
> >>>>>> Yes I have set the ip's of the three MGT servers in the "host" field
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>>> On 25-Jul-2019, at 2:14 PM, Pierre-Luc Dion <pdion...@apache.org>
> >>>> wrote:
> >>>>>>>
> >>>>>>> Do you have a load balancer in front of cloudstack? Did you set the
> >>>>>> global
> >>>>>>> settings "host" to the ip of the mgmt server?
> >>>>>>>
> >>>>>>>
> >>>>>>> Le jeu. 25 juill. 2019 03 h 24, Rakesh Venkatesh <
> >>>>>> www.rakeshv....@gmail.com>
> >>>>>>> a écrit :
> >>>>>>>
> >>>>>>>> Hello People
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> I have a strange issue where mgt server times out to send a
> command
> >> to
> >>>>>>>> secondary storage VM every hour and because of this UI won't be
> >>>>>> accessible
> >>>>>>>> for a short duration of time. Sometimes I have to restart mgt
> server
> >>>> to
> >>>>>> get
> >>>>>>>> it back to working state and sometimes I don't need to restart
> it. I
> >>>>>> also
> >>>>>>>> see some exceptions while fetching the storage stats.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> The log says secondary storage VM is lagging behind mgt server in
> >> ping
> >>>>>> and
> >>>>>>>> it sends a disconnect message to other components. Can you let me
> >> know
> >>>>>> how
> >>>>>>>> to troubleshoot this issue? I destroyed the secondary storage VM
> but
> >>>> the
> >>>>>>>> issue still persists. I checked the date/time on the mgt server
> and
> >>>> SSVM
> >>>>>>>> and they are same. This is happening for quite a few days now.
> Below
> >>>> are
> >>>>>>>> the logs
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2019-07-25 04:01:22,769 INFO  [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Found the following
> >>>>>> agents
> >>>>>>>> behind on ping: [183]
> >>>>>>>> 2019-07-25 04:01:22,775 WARN  [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentMonitor-1:ctx-c33dbe74) (logid:5442158c) Disconnect agent
> for
> >>>>>>>> CPVM/SSVM due to physical connection close. host: 183
> >>>>>>>> 2019-07-25 04:01:22,778 INFO  [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Host 183 is
> >>>>>> disconnecting
> >>>>>>>> with event ShutdownRequested
> >>>>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) The next status of
> >>>> agent
> >>>>>>>> 183is Disconnected, current status is Up
> >>>>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Deregistering link
> >> for
> >>>>>> 183
> >>>>>>>> with state Disconnected
> >>>>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Remove Agent : 183
> >>>>>>>> 2019-07-25 04:01:22,781 DEBUG [c.c.a.m.ConnectedAgentAttache]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Processing
> >> Disconnect.
> >>>>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Seq
> >>>>>>>> 183-7541559051008607242: Sending disconnect to class
> >>>>>>>> com.cloud.agent.manager.SynchronousListener
> >>>>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener:
> >>>> com.cloud.hypervisor.xenserver.discoverer.XcpServerDiscoverer
> >>>>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.u.n.NioConnection]
> >>>>>>>> (pool-2-thread-1:null) (logid:) Closing socket Socket[addr=/
> >>>>>> 172.30.32.16
> >>>>>>>> ,port=38250,localport=8250]
> >>>>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentAttache]
> >>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>>>>>> 183-7541559051008607242: Waiting some more time because this is
> the
> >>>>>> current
> >>>>>>>> command
> >>>>>>>> 2019-07-25 04:01:22,782 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener:
> >>>> com.cloud.hypervisor.hyperv.discoverer.HypervServerDiscoverer
> >>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentAttache]
> >>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>>>>>> 183-7541559051008607242: Waiting some more time because this is
> the
> >>>>>> current
> >>>>>>>> command
> >>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.deploy.DeploymentPlanningManagerImpl
> >>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.network.security.SecurityGroupListener
> >>>>>>>> 2019-07-25 04:01:22,783 INFO  [c.c.u.e.CSExceptionErrorCode]
> >>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Could not find
> >>>>>> exception:
> >>>>>>>> com.cloud.exception.OperationTimedoutException in error code list
> >> for
> >>>>>>>> exceptions
> >>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener:
> >>>> org.apache.cloudstack.engine.orchestration.NetworkOrchestrator
> >>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.vm.ClusteredVirtualMachineManagerImpl
> >>>>>>>> 2019-07-25 04:01:22,783 WARN  [c.c.a.m.AgentAttache]
> >>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>>>>>> 183-7541559051008607242: Timed out on null
> >>>>>>>> 2019-07-25 04:01:22,783 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.storage.listener.StoragePoolMonitor
> >>>>>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentAttache]
> >>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Seq
> >>>>>>>> 183-7541559051008607242: Cancelling.
> >>>>>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.storage.secondary.SecondaryStorageListener
> >>>>>>>> 2019-07-25 04:01:22,784 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.network.SshKeysDistriMonitor
> >>>>>>>> 2019-07-25 04:01:22,785 DEBUG [o.a.c.s.RemoteHostEndPoint]
> >>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Failed to send
> >>>> command,
> >>>>>>>> due to Agent:183, com.cloud.exception.OperationTimedoutException:
> >>>>>> Commands
> >>>>>>>> 7541559051008607242 to Host 183 timed out after 3600
> >>>>>>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener:
> >>>> com.cloud.network.router.VpcVirtualNetworkApplianceManagerImpl
> >>>>>>>> 2019-07-25 04:01:22,785 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.storage.download.DownloadListener
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2019-07-25 04:01:22,785 ERROR [c.c.s.StatsCollector]
> >>>>>>>> (StatsCollector-2:ctx-b55657a9) (logid:dafc4881) Error trying to
> >>>>>> retrieve
> >>>>>>>> storage stats
> >>>>>>>> com.cloud.utils.exception.CloudRuntimeException: Failed to send
> >>>> command,
> >>>>>>>> due to Agent:183, com.cloud.exception.OperationTimedoutException:
> >>>>>> Commands
> >>>>>>>> 7541559051008607242 to Host 183 timed out after 3600
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.storage.RemoteHostEndPoint.sendMessage(RemoteHostEndPoint.java:133)
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> com.cloud.server.StatsCollector$StorageCollector.runInContext(StatsCollector.java:1139)
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
> >>>>>>>>     at
> >>>>>>>>
> >>>>
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >>>>>>>>     at
> >>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >>>>>>>>     at
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >>>>>>>>     at java.lang.Thread.run(Thread.java:748)
> >>>>>>>> 2019-07-25 04:01:22,786 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.consoleproxy.ConsoleProxyListener
> >>>>>>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.storage.LocalStoragePoolListener
> >>>>>>>> 2019-07-25 04:01:22,789 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.storage.upload.UploadListener
> >>>>>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.capacity.StorageCapacityListener
> >>>>>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.capacity.ComputeCapacityListener
> >>>>>>>> 2019-07-25 04:01:22,790 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener: com.cloud.network.SshKeysDistriMonitor
> >>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener:
> >> com.cloud.network.router.VirtualNetworkApplianceManagerImpl
> >>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener:
> >>>>>>>>
> com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
> >>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.n.NetworkUsageManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Disconnected
> called
> >> on
> >>>>>> 183
> >>>>>>>> with status Disconnected
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener:
> >>>> com.cloud.agent.manager.AgentManagerImpl$BehindOnPingListener
> >>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.a.m.AgentManagerImpl]
> >>>>>>>> (AgentTaskPool-1:ctx-66de2057) (logid:841d2a63) Sending Disconnect
> >> to
> >>>>>>>> listener:
> >>>> com.cloud.agent.manager.AgentManagerImpl$SetHostParamsListener
> >>>>>>>> 2019-07-25 04:01:22,791 DEBUG [c.c.h.Status]
> >>>>>> (AgentTaskPool-1:ctx-66de2057)
> >>>>>>>> (logid:841d2a63) Transition:[Resource state = Enabled, Agent
> event =
> >>>>>>>> ShutdownRequested, Host id = 183, name = s-2775-VM]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Thanks and regards
> >>>>>>>> Rakesh venkatesh
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Andrija Panić
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> Andrija Panić
> >>
> >
> >
> > --
> >
> > Andrija Panić
>


-- 

Andrija Panić

Reply via email to