RE: Found 2 bugs in ACS 4.10. Possibly exist in 4.11 (master)

Sean Lair Sun, 07 Jan 2018 10:18:05 -0800

I use a wildcard cert on 4.9.2 it it's fine. We haven't gone to 4.10 yet to 
test.  We'll prob go straight to 4.11 when released.


We have also had the high-cpu on the mgmt servers in our 4.9.x deployments.  It 
is very frusterating, it also happens every few days.  Haven't been able to 
track down why yet.  In a different thread a while ago, Simon Weller also 
reporting high cpu issues but I'm not sure if he every found the culprit either.



-----Original Message-----
From: Ivan Kudryavtsev [mailto:[email protected]] 
Sent: Saturday, January 6, 2018 12:28 AM
To: dev <[email protected]>
Subject: Found 2 bugs in ACS 4.10. Possibly exist in 4.11 (master)

Hello, colleagues.

During last days I found 2 bugs which I believe is critical for 4.11 release. I 
would like to share them here and get help if possible:

1. CPVM bug. I use wildcard certificate issued by Comodo CA. I uploaded it to 
CS via UI and destroyed CPVM to force use it. It uses it like a charm, but 
after some amount of time it loses it and console proxy connection is no longer 
not possible. After it's beging rebooted or recreated everything works well. 
I'm not familiar with CPVM at all and can not even imaging what can be wrong 
here.

1a. CPVM has debug enabled and logs include tons of messages like:

2018-01-06 06:13:57,069 DEBUG
[cloud.consoleproxy.ConsoleProxyAjaxImageHandler] (Thread-4159:null) 
AjaxImageHandler
/ajaximg?token=RcHSrvzegyrjZAlc1Wjifcwv9P8WwK3eH63SuIS8WFFGssxymmjdYkZ4-S4ilY1UHxX612Lt_5Xi1Z5JaoCfDSf_UCi8lTIsPEBlDpUEWQg1IblYu0HxvoDugX9J4XgAdpj74qg_U4pOs74dzdZFB50PB_HxcMhzUqd5plH914PmRDw5k0ONaa183CsGa7DcGVvWaR_eYP_8_CArahGAjHt04Kx227tjyMx4Zaju7iNyxpBWxtBC5YJyj8rjv7IeA_0Pevz91pWn6OE1pkeLwGeFSV8pZw4BWg95SG97A-I&key=2020&ts=1515219237015
2018-01-06 06:13:57,070 DEBUG
[cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null) decode 
token. host: 10.252.2.10
2018-01-06 06:13:57,070 DEBUG
[cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null) decode 
token. port: 5903
2018-01-06 06:13:57,070 DEBUG
[cloud.consoleproxy.ConsoleProxyHttpHandlerHelper] (Thread-4159:null) decode 
token. tag: 375c62b5-74d9-4494-8b79-0d7c76cff10f

Every opened session is dumped to logs. I suppose it's dangerous and could lead 
to FS overusage and CPVM failure.

/dev/vda10                                              368M   63M  287M
19% /var/log

Might it be that (1) is a consequence of (1a)?

2. High CPU utilization bug. After management server is launched it uses 0 CPU 
because I run development cloud. After two days I see that 2 cores are used 50% 
by management server processes, several days ago I saw all management server 
processes utilized almost all CPU available. Surprisingly It continues function 
(API, UI), no active API utlization in logs.

The only two suspicios things I found for the last incident are:

root@cs2-head1:/var/log/cloudstack/management# zgrep ERROR 
management-server.log.2018-01-04.gz
2018-01-04 12:58:23,391 ERROR [c.c.c.ClusterManagerImpl]
(localhost-startStop-1:null) (logid:) Unable to ping management server at
10.252.2.2:9090 due to ConnectException
2018-01-04 12:58:25,743 ERROR [c.c.u.PropertiesUtil]
(localhost-startStop-1:null) (logid:) Unable to find properties file:
commands.properties
2018-01-04 14:36:23,874 ERROR [c.c.u.PropertiesUtil]
(localhost-startStop-1:null) (logid:) Unable to find properties file:
commands.properties
2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188 ctx-b1887051)
(logid:be4b64e0) Invocation exception, caused by:
com.cloud.exception.InsufficientServerCapacityException: Unable to create a 
deployment for VM[SecondaryStorageVm|s-24-VM]Scope=interface
com.cloud.dc.DataCenter; id=1
2018-01-04 14:43:23,043 ERROR [c.c.v.VmWorkJobHandlerProxy]
(Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185 ctx-83290fa8)
(logid:65010252) Invocation exception, caused by:
com.cloud.exception.InsufficientServerCapacityException: Unable to create a 
deployment for VM[ConsoleProxy|v-10-VM]Scope=interface
com.cloud.dc.DataCenter; id=1
2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher]
(Work-Job-Executor-5:ctx-e566f561 job-38158/job-38188) (logid:be4b64e0) Unable 
to complete AsyncJobVO {id:38188, userId: 1, accountId: 1,
instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
cmdInfo:
rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAAGHQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA,
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 2485138019287, completeMsid: null, lastUpdated:
null, lastPolled: null, created: Thu Jan 04 14:43:22 KRAT 2018}, job
origin:38158
2018-01-04 14:43:23,044 ERROR [c.c.v.VmWorkJobDispatcher]
(Work-Job-Executor-4:ctx-faf69614 job-38155/job-38185) (logid:65010252) Unable 
to complete AsyncJobVO {id:38185, userId: 1, accountId: 1,
instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart,
cmdInfo:
rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAACnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA,
cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0,
result: null, initMsid: 2485138019287, completeMsid: null, lastUpdated:
null, lastPolled: null, created: Thu Jan 04 14:43:21 KRAT 2018}, job
origin:38155
2018-01-04 14:43:25,127 ERROR
[o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
(consoleproxy-1:ctx-6f2f9b7b) (logid:25acd369) No remote endpoint to send 
DeleteCommand, check if host or ssvm is down?
2018-01-04 14:43:25,127 ERROR
[o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
(secstorage-1:ctx-ae3adf87) (logid:4db1e2a0) No remote endpoint to send 
DeleteCommand, check if host or ssvm is down?


The worst thing is that I don't even have an idea how to catch it. Also, as I 
have second management down, I see a lot of events like:
2017-12-28 05:08:34,927 DEBUG [c.c.c.ClusterManagerImpl]
(Cluster-Heartbeat-1:ctx-9bbae21c) (logid:9a1b0b21) Management server heartbeat 
takes too long to finish. profiler: Done. Duration: 1935ms,
profilerHeartbeatUpdate: Done. Duration: 617ms, profilerPeerScan: Done.
Duration: 1317ms

Could this be a reason of high CPU utilization?

--
With best regards, Ivan Kudryavtsev
Bitworks Software, Ltd.
Cell: +7-923-414-1515
WWW: http://bitworks.software/ <http://bw-sw.com/>

RE: Found 2 bugs in ACS 4.10. Possibly exist in 4.11 (master)

Reply via email to