Re: Found 2 bugs in ACS 4.10. Possibly exist in 4.11 (master)

Ivan Kudryavtsev Fri, 05 Jan 2018 22:38:38 -0800

Additional info. In CPVM logs (after SSH to it) I found:

2018-01-06 06:34:03,631 ERROR [consoleproxy.vnc.VncServerPacketReceiver]
(Thread-13:null) Unexpected exception:
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.net.SocketInputStream.read(SocketInputStream.java:141)
        at java.net.SocketInputStream.read(SocketInputStream.java:224)
        at
java.io.DataInputStream.readUnsignedByte(DataInputStream.java:288)
        at
com.cloud.consoleproxy.vnc.VncServerPacketReceiver.run(VncServerPacketReceiver.java:61)
        at
com.cloud.consoleproxy.vnc.VncClient.doConnect(VncClient.java:190)
        at
com.cloud.consoleproxy.vnc.VncClient.connectTo(VncClient.java:156)
        at
com.cloud.consoleproxy.ConsoleProxyVncClient$1.run(ConsoleProxyVncClient.java:99)
        at java.lang.Thread.run(Thread.java:748)
2018-01-06 06:34:03,645 INFO  [consoleproxy.vnc.VncServerPacketReceiver]
(Thread-13:null) Receiving thread exit processing, shutdown connection
2018-01-06 06:34:03,645 INFO  [cloud.consoleproxy.ConsoleProxyVncClient]
(Thread-13:null) Receiver thread stopped.
2018-01-06 06:34:03,645 INFO  [cloud.consoleproxy.ConsoleProxyVncClient]
(Thread-13:null) Received client close indication. remove viewer from map.
2018-01-06 06:34:03,672 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null)
Received response: Seq 49-263:  { Ans: , MgmtId: 2485138019287, via: 49,
Ver: v1, Flags: 100010,
[{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] }
2018-01-06 06:34:04,453 INFO  [consoleproxy.vnc.VncClientPacketSender]
(Thread-14:null) Sending thread exit processing, shutdown connection


It seems, that after this exception it stops function.



2018-01-06 13:28 GMT+07:00 Ivan Kudryavtsev <kudryavtsev...@bw-sw.com>:

> Hello, colleagues.
>
> During last days I found 2 bugs which I believe is critical for 4.11
> release. I would like to share them here and get help if possible:
>
> 1. CPVM bug. I use wildcard certificate issued by Comodo CA. I uploaded it
> to CS via UI and destroyed CPVM to force use it. It uses it like a charm,
> but after some amount of time it loses it and console proxy connection is
> no longer not possible. After it's beging rebooted or recreated everything
> works well. I'm not familiar with CPVM at all and can not even imaging what
> can be wrong here.
>
> 1a. CPVM has debug enabled and logs include tons of messages like:
>
> 2018-01-06 06:13:57,069 DEBUG 
> [cloud.consoleproxy.ConsoleProxyAjaxImageHandler]
> (Thread-4159:null) AjaxImageHandler /ajaximg?token=
> RcHSrvzegyrjZAlc1Wjifcwv9P8WwK3eH63SuIS8WFFGssxymmjdYkZ4-S4ilY1UHxX612Lt_
> 5Xi1Z5JaoCfDSf_UCi8lTIsPEBlDpUEWQg1IblYu0HxvoDugX9J4XgAdpj74qg_
> U4pOs74dzdZFB50PB_HxcMhzUqd5plH914PmRDw5k0ONaa183CsGa7DcGVvWaR_eYP_8_
> CArahGAjHt04Kx227tjyMx4Zaju7iNyxpBWxtBC5YJyj8rjv7IeA_
> 0Pevz91pWn6OE1pkeLwGeFSV8pZw4BWg95SG97A-I&key=2020&ts=1515219237015
> 2018-01-06 06:13:57,070 DEBUG 
> [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper]
> (Thread-4159:null) decode token. host: 10.252.2.10
> 2018-01-06 06:13:57,070 DEBUG 
> [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper]
> (Thread-4159:null) decode token. port: 5903
> 2018-01-06 06:13:57,070 DEBUG 
> [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper]
> (Thread-4159:null) decode token. tag: 375c62b5-74d9-4494-8b79-0d7c76cff10f
>
> Every opened session is dumped to logs. I suppose it's dangerous and could
> lead to FS overusage and CPVM failure.
>
> /dev/vda10                                              368M   63M  287M
> 19% /var/log
>
> Might it be that (1) is a consequence of (1a)?
>
> 2. High CPU utilization bug. After management server is launched it uses 0
> CPU because I run development cloud. After two days I see that 2 cores are
> used 50% by management server processes, several days ago I saw all
> management server processes utilized almost all CPU available. Surprisingly
> It continues function (API, UI), no active API utlization in logs.
>
> The only two suspicios things I found for the last incident are:
>
> root@cs2-head1:/var/log/cloudstack/management# zgrep ERROR
> management-server.log.2018-01-04.gz
> 2018-01-04 12:58:23,391 ERROR [c.c.c.ClusterManagerImpl]
> (localhost-startStop-1:null) (logid:) Unable to ping management server at
> 10.252.2.2:9090 due to ConnectException
> 2018-01-04 12:58:25,743 ERROR [c.c.u.PropertiesUtil]
> (localhost-startStop-1:null) (logid:) Unable to find properties file:
> commands.properties
> 2018-01-04 14:36:23,874 ERROR [c.c.u.PropertiesUtil]
> (localhost-startStop-1:null) (logid:) Unable to find properties file:
> commands.properties
> 2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy]
> (Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188 ctx-b1887051)
> (logid:be4b64e0) Invocation exception, caused by: com.cloud.exception.
> InsufficientServerCapacityException: Unable to create a deployment for
> VM[SecondaryStorageVm|s-24-VM]Scope=interface com.cloud.dc.DataCenter;
> id=1
> 2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy]
> (Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185 ctx-83290fa8)
> (logid:65010252) Invocation exception, caused by: com.cloud.exception.
> InsufficientServerCapacityException: Unable to create a deployment for
> VM[ConsoleProxy|v-10-VM]Scope=interface com.cloud.dc.DataCenter; id=1
> 2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher]
> (Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188) (logid:be4b64e0)
> Unable to complete AsyncJobVO {id:38188, userId: 1, accountId: 1,
> instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
> cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oA
> BGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50
> UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5n
> L0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYX
> ZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-
> AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS
> 91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-
> ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-
> AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAGHQAGVZpcnR1YWxNYWNoaW5l
> TWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status:
> IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid:
> 2485138019287, completeMsid: null, lastUpdated: null, lastPolled: null,
> created: Thu Jan 04 14:43:22 KRAT 2018}, job origin:38158
> 2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher]
> (Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185) (logid:65010252)
> Unable to complete AsyncJobVO {id:38185, userId: 1, accountId: 1,
> instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
> cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oA
> BGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50
> UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5n
> L0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYX
> ZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-
> AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS
> 91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-
> ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-
> AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAACnQAGVZpcnR1YWxNYWNoaW5l
> TWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status:
> IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid:
> 2485138019287, completeMsid: null, lastUpdated: null, lastPolled: null,
> created: Thu Jan 04 14:43:21 KRAT 2018}, job origin:38155
> 2018-01-04 14:43:25,127 ERROR 
> [o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
> (consoleproxy-1:ctx-6f2f9b7b) (logid:25acd369) No remote endpoint to send
> DeleteCommand, check if host or ssvm is down?
> 2018-01-04 14:43:25,127 ERROR 
> [o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
> (secstorage-1:ctx-ae3adf87) (logid:4db1e2a0) No remote endpoint to send
> DeleteCommand, check if host or ssvm is down?
>
>
> The worst thing is that I don't even have an idea how to catch it. Also,
> as I have second management down, I see a lot of events like:
> 2017-12-28 05:08:34,927 DEBUG [c.c.c.ClusterManagerImpl]
> (Cluster-Heartbeat-1:ctx-9bbae21c) (logid:9a1b0b21) Management server
> heartbeat takes too long to finish. profiler: Done. Duration: 1935ms,
> profilerHeartbeatUpdate: Done. Duration: 617ms, profilerPeerScan: Done.
> Duration: 1317ms
>
> Could this be a reason of high CPU utilization?
>
> --
> With best regards, Ivan Kudryavtsev
> Bitworks Software, Ltd.
> Cell: +7-923-414-1515
> WWW: http://bitworks.software/ <http://bw-sw.com/>
>
>


-- 
With best regards, Ivan Kudryavtsev
Bitworks Software, Ltd.
Cell: +7-923-414-1515
WWW: http://bitworks.software/ <http://bw-sw.com/>

Re: Found 2 bugs in ACS 4.10. Possibly exist in 4.11 (master)

Reply via email to