Nux created CLOUDSTACK-10234:
--------------------------------

             Summary: HA fails in cases of PSU failure.
                 Key: CLOUDSTACK-10234
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10234
             Project: CloudStack
          Issue Type: Improvement
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Management Server
    Affects Versions: 4.11.0.0
         Environment: 4.11 RC1, NFS storage, CentOS 7 management server and 
hypervisors
            Reporter: Nux


To simulate PSU failure I pulled the power from the server physically, HA fails 
to do the right thing and move the affected VMs to other HVs.

I waited a good while, but alas nothing happened. The VM and VR running on the 
affected hypervisor were never moved to another one (I have another 2 running).

 

This is what I see in the management server logs:
{code:java}
Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band 
Management action (OFF) on host (57bf86e0-e1cd-484e-a4f1-78b3ca2da125) failed 
with error: Get Auth Capabilities error Error issuing Get Channel 
Authentication Capabilities request Error: Unable to establish IPMI v2 / RMCP+ 
session     at 
org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
     at sun.reflect.GeneratedMethodAccessor199.invoke(Unknown Source)     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     ... 21 more 2018-01-16 17:00:13,396 WARN  [o.a.c.alerts] 
(pool-5-thread-7:null) (logid:4f7299f6) AlertType:: 30 | dataCenterId:: 1 | 
podId:: 1 | clusterId:: null | message:: HA Fencing of host id=1, in dc id=1 
performed 2018-01-16 17:00:15,375 DEBUG [c.c.a.t.Request] 
(pool-2-thread-27:null) (logid:6b21a8c1) Seq 5-9115285645797884785: Sending  \{ 
Cmd , MgmtId: 161334379813, via: 5(hv03.cloud.local), Ver: v1, Flags: 100011, 
[{"com.cloud.agent.api.CheckOnHostCommand":{"host":{"guid":"598d48ef-158d-3e14-ad68-8d02c9368ddf-LibvirtComputingResource","privateNetwork":{"ip":"172.16.25.101","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:f6","isSecurityGroupEnabled":false},"publicNetwork":\{"ip":"172.16.25.101","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:f6","isSecurityGroupEnabled":false},"storageNetwork1":\{"ip":"172.16.25.101","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:f6","isSecurityGroupEnabled":false}},"wait":20}}]
 } 2018-01-16 17:00:15,380 DEBUG [c.c.a.t.Request] (pool-2-thread-5:null) 
(logid:bb993597) Seq 4-6582855280332112812: Sending  \{ Cmd , MgmtId: 
161334379813, via: 4(hv02.cloud.local), Ver: v1, Flags: 100011, 
[{"com.cloud.agent.api.CheckOnHostCommand":{"host":{"guid":"6ebb3010-9c49-3a9c-b620-ecbc9731aca2-LibvirtComputingResource","privateNetwork":{"ip":"172.16.25.100","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:8e","isSecurityGroupEnabled":false},"publicNetwork":\{"ip":"172.16.25.100","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:8e","isSecurityGroupEnabled":false},"storageNetwork1":\{"ip":"172.16.25.100","netmask":"255.255.255.240","mac":"0c:c4:7a:40:8e:8e","isSecurityGroupEnabled":false}},"wait":20}}]
 } 2018-01-16 17:00:15,423 DEBUG [c.c.a.t.Request] 
(AgentManager-Handler-4:null) (logid:) Seq 5-9115285645797884785: Processing:  
\{ Ans: , MgmtId: 161334379813, via: 5, Ver: v1, Flags: 10, 
[{"com.cloud.agent.api.Answer":{"result":false,"details":"Heart is 
beating...","wait":0}}] } 2018-01-16 17:00:15,423 DEBUG [c.c.a.t.Request] 
(pool-2-thread-27:null) (logid:6b21a8c1) Seq 5-9115285645797884785: Received:  
\{ Ans: , MgmtId: 161334379813, via: 5(hv03.cloud.local), Ver: v1, Flags: 10, { 
Answer } } 2018-01-16 17:00:15,423 DEBUG [c.c.a.m.AgentManagerImpl] 
(pool-2-thread-27:null) (logid:6b21a8c1) Details from executing class 
com.cloud.agent.api.CheckOnHostCommand: Heart is beating... 2018-01-16 
17:00:15,427 DEBUG [c.c.a.t.Request] (AgentManager-Handler-6:null) (logid:) Seq 
4-6582855280332112812: Processing:  \{ Ans: , MgmtId: 161334379813, via: 4, 
Ver: v1, Flags: 10, 
[{"com.cloud.agent.api.Answer":{"result":false,"details":"Heart is 
beating...","wait":0}}] } 2018-01-16 17:00:15,427 DEBUG [c.c.a.t.Request] 
(pool-2-thread-5:null) (logid:bb993597) Seq 4-6582855280332112812: Received:  
\{ Ans: , MgmtId: 161334379813, via: 4(hv02.cloud.local), Ver: v1, Flags: 10, { 
Answer } } 2018-01-16 17:00:15,427 DEBUG [c.c.a.m.AgentManagerImpl] 
(pool-2-thread-5:null) (logid:bb993597) Details from executing class 
com.cloud.agent.api.CheckOnHostCommand: Heart is beating... 2018-01-16 
17:00:16,217 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl] 
(AsyncJobMgr-Heartbeat-1:ctx-d9c2c841) (logid:1b093681) Begin cleanup expired 
async-jobs 2018-01-16 17:00:16,218 INFO  [o.a.c.f.j.i.AsyncJobManagerImpl] 
(AsyncJobMgr-Heartbeat-1:ctx-d9c2c841) (logid:1b093681) End cleanup expired 
async-jobs 2018-01-16 17:00:17,392 WARN  [o.a.c.o.PowerOperationTask] 
(pool-6-thread-29:null) (logid:f9788c38) Out-of-band management background task 
operation=STATUS for host id=1 failed with: Out-of-band Management action 
(STATUS) on host (57bf86e0-e1cd-484e-a4f1-78b3ca2da125) failed with error: Get 
Auth Capabilities error Error issuing Get Channel Authentication Capabilities 
request Error: Unable to establish IPMI v2 / RMCP+ session 2018-01-16 
17:00:17,422 DEBUG [o.a.c.o.OutOfBandManagementServiceImpl] 
(pool-5-thread-6:ctx-65225bcc) (logid:665de20f) Out-of-band Management action 
(OFF) on host (57bf86e0-e1cd-484e-a4f1-78b3ca2da125) failed with error: Get 
Auth Capabilities error Error issuing Get Channel Authentication Capabilities 
request Error: Unable to establish IPMI v2 / RMCP+ session 2018-01-16 
17:00:17,438 WARN  [o.a.c.k.h.KVMHAProvider] (pool-5-thread-6:ctx-65225bcc) 
(logid:665de20f) OOBM service is not configured or enabled for this host 
hv01.cloud.local error is Out-of-band Management action (OFF) on host 
(57bf86e0-e1cd-484e-a4f1-78b3ca2da125) failed with error: Get Auth Capabilities 
error Error issuing Get Channel Authentication Capabilities request Error: 
Unable to establish IPMI v2 / RMCP+ session 2018-01-16 17:00:17,438 WARN  
[o.a.c.h.t.BaseHATask] (pool-5-thread-9:null) (logid:ff44841a) Exception 
occurred while running FenceTask on a resource: 
org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not 
configured or enabled for this host hv01.cloud.local 
org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not 
configured or enabled for this host hv01.cloud.local     at 
org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99)     at 
org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)     at 
org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)     at 
org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)     at 
org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)     at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
    at java.lang.Thread.run(Thread.java:748) Caused by: 
com.cloud.utils.exception.CloudRuntimeException: Out-of-band Management action 
(OFF) on host (57bf86e0-e1cd-484e-a4f1-78b3ca2da125) failed with error: Get 
Auth Capabilities error Error issuing Get Channel Authentication Capabilities 
request Error: Unable to establish IPMI v2 / RMCP+ session     at 
org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
     at sun.reflect.GeneratedMethodAccessor199.invoke(Unknown Source)     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     ... 21 more 2018-01-16 17:00:17,439 WARN  [o.a.c.alerts] 
(pool-5-thread-9:null) (logid:ff44841a) AlertType:: 30 | dataCenterId:: 1 | 
podId:: 1 | clusterId:: null | message:: HA Fencing of host id=1, in dc id=1 
performed 2018-01-16 17:00:17,903 DEBUG [o.a.c.s.SecondaryStorageManagerImpl] 
(secstorage-1:ctx-ccb33721) (logid:722404aa) Zone 1 is ready to launch 
secondary storage VM 2018-01-16 17:00:17,935 DEBUG 
[c.c.c.ConsoleProxyManagerImpl] (consoleproxy-1:ctx-22a69a02) (logid:393fab21) 
Zone 1 is ready to launch console proxy
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to