Additional info. In CPVM logs (after SSH to it) I found: 2018-01-06 06:34:03,631 ERROR [consoleproxy.vnc.VncServerPacketReceiver] (Thread-13:null) Unexpected exception: java.net.SocketException: Socket closed at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:171) at java.net.SocketInputStream.read(SocketInputStream.java:141) at java.net.SocketInputStream.read(SocketInputStream.java:224) at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:288) at com.cloud.consoleproxy.vnc.VncServerPacketReceiver.run(VncServerPacketReceiver.java:61) at com.cloud.consoleproxy.vnc.VncClient.doConnect(VncClient.java:190) at com.cloud.consoleproxy.vnc.VncClient.connectTo(VncClient.java:156) at com.cloud.consoleproxy.ConsoleProxyVncClient$1.run(ConsoleProxyVncClient.java:99) at java.lang.Thread.run(Thread.java:748) 2018-01-06 06:34:03,645 INFO [consoleproxy.vnc.VncServerPacketReceiver] (Thread-13:null) Receiving thread exit processing, shutdown connection 2018-01-06 06:34:03,645 INFO [cloud.consoleproxy.ConsoleProxyVncClient] (Thread-13:null) Receiver thread stopped. 2018-01-06 06:34:03,645 INFO [cloud.consoleproxy.ConsoleProxyVncClient] (Thread-13:null) Received client close indication. remove viewer from map. 2018-01-06 06:34:03,672 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null) Received response: Seq 49-263: { Ans: , MgmtId: 2485138019287, via: 49, Ver: v1, Flags: 100010, [{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] } 2018-01-06 06:34:04,453 INFO [consoleproxy.vnc.VncClientPacketSender] (Thread-14:null) Sending thread exit processing, shutdown connection
It seems, that after this exception it stops function. 2018-01-06 13:28 GMT+07:00 Ivan Kudryavtsev <kudryavtsev...@bw-sw.com>: > Hello, colleagues. > > During last days I found 2 bugs which I believe is critical for 4.11 > release. I would like to share them here and get help if possible: > > 1. CPVM bug. I use wildcard certificate issued by Comodo CA. I uploaded it > to CS via UI and destroyed CPVM to force use it. It uses it like a charm, > but after some amount of time it loses it and console proxy connection is > no longer not possible. After it's beging rebooted or recreated everything > works well. I'm not familiar with CPVM at all and can not even imaging what > can be wrong here. > > 1a. CPVM has debug enabled and logs include tons of messages like: > > 2018-01-06 06:13:57,069 DEBUG > [cloud.consoleproxy.ConsoleProxyAjaxImageHandler] > (Thread-4159:null) AjaxImageHandler /ajaximg?token= > RcHSrvzegyrjZAlc1Wjifcwv9P8WwK3eH63SuIS8WFFGssxymmjdYkZ4-S4ilY1UHxX612Lt_ > 5Xi1Z5JaoCfDSf_UCi8lTIsPEBlDpUEWQg1IblYu0HxvoDugX9J4XgAdpj74qg_ > U4pOs74dzdZFB50PB_HxcMhzUqd5plH914PmRDw5k0ONaa183CsGa7DcGVvWaR_eYP_8_ > CArahGAjHt04Kx227tjyMx4Zaju7iNyxpBWxtBC5YJyj8rjv7IeA_ > 0Pevz91pWn6OE1pkeLwGeFSV8pZw4BWg95SG97A-I&key=2020&ts=1515219237015 > 2018-01-06 06:13:57,070 DEBUG > [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] > (Thread-4159:null) decode token. host: 10.252.2.10 > 2018-01-06 06:13:57,070 DEBUG > [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] > (Thread-4159:null) decode token. port: 5903 > 2018-01-06 06:13:57,070 DEBUG > [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] > (Thread-4159:null) decode token. tag: 375c62b5-74d9-4494-8b79-0d7c76cff10f > > Every opened session is dumped to logs. I suppose it's dangerous and could > lead to FS overusage and CPVM failure. > > /dev/vda10 368M 63M 287M > 19% /var/log > > Might it be that (1) is a consequence of (1a)? > > 2. High CPU utilization bug. After management server is launched it uses 0 > CPU because I run development cloud. After two days I see that 2 cores are > used 50% by management server processes, several days ago I saw all > management server processes utilized almost all CPU available. Surprisingly > It continues function (API, UI), no active API utlization in logs. > > The only two suspicios things I found for the last incident are: > > root@cs2-head1:/var/log/cloudstack/management# zgrep ERROR > management-server.log.2018-01-04.gz > 2018-01-04 12:58:23,391 ERROR [c.c.c.ClusterManagerImpl] > (localhost-startStop-1:null) (logid:) Unable to ping management server at > 10.252.2.2:9090 due to ConnectException > 2018-01-04 12:58:25,743 ERROR [c.c.u.PropertiesUtil] > (localhost-startStop-1:null) (logid:) Unable to find properties file: > commands.properties > 2018-01-04 14:36:23,874 ERROR [c.c.u.PropertiesUtil] > (localhost-startStop-1:null) (logid:) Unable to find properties file: > commands.properties > 2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy] > (Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188 ctx-b1887051) > (logid:be4b64e0) Invocation exception, caused by: com.cloud.exception. > InsufficientServerCapacityException: Unable to create a deployment for > VM[SecondaryStorageVm|s-24-VM]Scope=interface com.cloud.dc.DataCenter; > id=1 > 2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy] > (Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185 ctx-83290fa8) > (logid:65010252) Invocation exception, caused by: com.cloud.exception. > InsufficientServerCapacityException: Unable to create a deployment for > VM[ConsoleProxy|v-10-VM]Scope=interface com.cloud.dc.DataCenter; id=1 > 2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher] > (Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188) (logid:be4b64e0) > Unable to complete AsyncJobVO {id:38188, userId: 1, accountId: 1, > instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, > cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oA > BGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50 > UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5n > L0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYX > ZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB- > AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS > 91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5- > ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB- > AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAGHQAGVZpcnR1YWxNYWNoaW5l > TWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status: > IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: > 2485138019287, completeMsid: null, lastUpdated: null, lastPolled: null, > created: Thu Jan 04 14:43:22 KRAT 2018}, job origin:38158 > 2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher] > (Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185) (logid:65010252) > Unable to complete AsyncJobVO {id:38185, userId: 1, accountId: 1, > instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, > cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oA > BGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50 > UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5n > L0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYX > ZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB- > AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS > 91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5- > ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB- > AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAACnQAGVZpcnR1YWxNYWNoaW5l > TWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status: > IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: > 2485138019287, completeMsid: null, lastUpdated: null, lastPolled: null, > created: Thu Jan 04 14:43:21 KRAT 2018}, job origin:38155 > 2018-01-04 14:43:25,127 ERROR > [o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl] > (consoleproxy-1:ctx-6f2f9b7b) (logid:25acd369) No remote endpoint to send > DeleteCommand, check if host or ssvm is down? > 2018-01-04 14:43:25,127 ERROR > [o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl] > (secstorage-1:ctx-ae3adf87) (logid:4db1e2a0) No remote endpoint to send > DeleteCommand, check if host or ssvm is down? > > > The worst thing is that I don't even have an idea how to catch it. Also, > as I have second management down, I see a lot of events like: > 2017-12-28 05:08:34,927 DEBUG [c.c.c.ClusterManagerImpl] > (Cluster-Heartbeat-1:ctx-9bbae21c) (logid:9a1b0b21) Management server > heartbeat takes too long to finish. profiler: Done. Duration: 1935ms, > profilerHeartbeatUpdate: Done. Duration: 617ms, profilerPeerScan: Done. > Duration: 1317ms > > Could this be a reason of high CPU utilization? > > -- > With best regards, Ivan Kudryavtsev > Bitworks Software, Ltd. > Cell: +7-923-414-1515 > WWW: http://bitworks.software/ <http://bw-sw.com/> > > -- With best regards, Ivan Kudryavtsev Bitworks Software, Ltd. Cell: +7-923-414-1515 WWW: http://bitworks.software/ <http://bw-sw.com/>