I use a wildcard cert on 4.9.2 it it's fine. We haven't gone to 4.10 yet to test. We'll prob go straight to 4.11 when released.
We have also had the high-cpu on the mgmt servers in our 4.9.x deployments. It is very frusterating, it also happens every few days. Haven't been able to track down why yet. In a different thread a while ago, Simon Weller also reporting high cpu issues but I'm not sure if he every found the culprit either. -----Original Message----- From: Ivan Kudryavtsev [mailto:kudryavtsev...@bw-sw.com] Sent: Saturday, January 6, 2018 12:28 AM To: dev <dev@cloudstack.apache.org> Subject: Found 2 bugs in ACS 4.10. Possibly exist in 4.11 (master) Hello, colleagues. During last days I found 2 bugs which I believe is critical for 4.11 release. I would like to share them here and get help if possible: 1. CPVM bug. I use wildcard certificate issued by Comodo CA. I uploaded it to CS via UI and destroyed CPVM to force use it. It uses it like a charm, but after some amount of time it loses it and console proxy connection is no longer not possible. After it's beging rebooted or recreated everything works well. I'm not familiar with CPVM at all and can not even imaging what can be wrong here. 1a. CPVM has debug enabled and logs include tons of messages like: 2018-01-06 06:13:57,069 DEBUG [cloud.consoleproxy.ConsoleProxyAjaxImageHandler] (Thread-4159:null) AjaxImageHandler /ajaximg?token=RcHSrvzegyrjZAlc1Wjifcwv9P8WwK3eH63SuIS8WFFGssxymmjdYkZ4-S4ilY1UHxX612Lt_5Xi1Z5JaoCfDSf_UCi8lTIsPEBlDpUEWQg1IblYu0HxvoDugX9J4XgAdpj74qg_U4pOs74dzdZFB50PB_HxcMhzUqd5plH914PmRDw5k0ONaa183CsGa7DcGVvWaR_eYP_8_CArahGAjHt04Kx227tjyMx4Zaju7iNyxpBWxtBC5YJyj8rjv7IeA_0Pevz91pWn6OE1pkeLwGeFSV8pZw4BWg95SG97A-I&key=2020&ts=1515219237015 2018-01-06 06:13:57,070 DEBUG [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null) decode token. host: 10.252.2.10 2018-01-06 06:13:57,070 DEBUG [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null) decode token. port: 5903 2018-01-06 06:13:57,070 DEBUG [cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null) decode token. tag: 375c62b5-74d9-4494-8b79-0d7c76cff10f Every opened session is dumped to logs. I suppose it's dangerous and could lead to FS overusage and CPVM failure. /dev/vda10 368M 63M 287M 19% /var/log Might it be that (1) is a consequence of (1a)? 2. High CPU utilization bug. After management server is launched it uses 0 CPU because I run development cloud. After two days I see that 2 cores are used 50% by management server processes, several days ago I saw all management server processes utilized almost all CPU available. Surprisingly It continues function (API, UI), no active API utlization in logs. The only two suspicios things I found for the last incident are: root@cs2-head1:/var/log/cloudstack/management# zgrep ERROR management-server.log.2018-01-04.gz 2018-01-04 12:58:23,391 ERROR [c.c.c.ClusterManagerImpl] (localhost-startStop-1:null) (logid:) Unable to ping management server at 10.252.2.2:9090 due to ConnectException 2018-01-04 12:58:25,743 ERROR [c.c.u.PropertiesUtil] (localhost-startStop-1:null) (logid:) Unable to find properties file: commands.properties 2018-01-04 14:36:23,874 ERROR [c.c.u.PropertiesUtil] (localhost-startStop-1:null) (logid:) Unable to find properties file: commands.properties 2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188 ctx-b1887051) (logid:be4b64e0) Invocation exception, caused by: com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[SecondaryStorageVm|s-24-VM]Scope=interface com.cloud.dc.DataCenter; id=1 2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185 ctx-83290fa8) (logid:65010252) Invocation exception, caused by: com.cloud.exception.InsufficientServerCapacityException: Unable to create a deployment for VM[ConsoleProxy|v-10-VM]Scope=interface com.cloud.dc.DataCenter; id=1 2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188) (logid:be4b64e0) Unable to complete AsyncJobVO {id:38188, userId: 1, accountId: 1, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAGHQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 2485138019287, completeMsid: null, lastUpdated: null, lastPolled: null, created: Thu Jan 04 14:43:22 KRAT 2018}, job origin:38158 2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185) (logid:65010252) Unable to complete AsyncJobVO {id:38185, userId: 1, accountId: 1, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAACnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 2485138019287, completeMsid: null, lastUpdated: null, lastPolled: null, created: Thu Jan 04 14:43:21 KRAT 2018}, job origin:38155 2018-01-04 14:43:25,127 ERROR [o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl] (consoleproxy-1:ctx-6f2f9b7b) (logid:25acd369) No remote endpoint to send DeleteCommand, check if host or ssvm is down? 2018-01-04 14:43:25,127 ERROR [o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl] (secstorage-1:ctx-ae3adf87) (logid:4db1e2a0) No remote endpoint to send DeleteCommand, check if host or ssvm is down? The worst thing is that I don't even have an idea how to catch it. Also, as I have second management down, I see a lot of events like: 2017-12-28 05:08:34,927 DEBUG [c.c.c.ClusterManagerImpl] (Cluster-Heartbeat-1:ctx-9bbae21c) (logid:9a1b0b21) Management server heartbeat takes too long to finish. profiler: Done. Duration: 1935ms, profilerHeartbeatUpdate: Done. Duration: 617ms, profilerPeerScan: Done. Duration: 1317ms Could this be a reason of high CPU utilization? -- With best regards, Ivan Kudryavtsev Bitworks Software, Ltd. Cell: +7-923-414-1515 WWW: http://bitworks.software/ <http://bw-sw.com/>