I have KVM Host HA enabled and power is lost to one of the compute nodes.   The 
host has it's state marked as alert and the HA states go through degraded to 
suspect to Fencing.

The problem is that the host is never fenced because there is no power to it so 
none of the OOBM commands work which means the VMs are never migrated.

 From the management server logs -

2019-03-04 11:02:48,288 WARN  [o.a.c.h.t.BaseHATask] (pool-6-thread-9:null) 
(logid:d0a19f20) Exception occurred while running FenceTask on a resource: 
org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not 
configured or enabled for this host dcp-cscn2.local
org.apache.cloudstack.ha.provider.HAFenceException: OOBM service is not 
configured or enabled for this host dcp-cscn2.local
        at 
org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:99)
        at 
org.apache.cloudstack.kvm.ha.KVMHAProvider.fence(KVMHAProvider.java:42)
        at 
org.apache.cloudstack.ha.task.FenceTask.performAction(FenceTask.java:42)
        at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:86)
        at org.apache.cloudstack.ha.task.BaseHATask$1.call(BaseHATask.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.cloud.utils.exception.CloudRuntimeException: Out-of-band 
Management action (OFF) on host (b53122bc-1446-4ffd-a179-e363ad0d541f) failed 
with error: Get Auth Capabilities error
Error issuing Get Channel Authentication Capabilities request
Error: Unable to establish IPMI v2 / RMCP+ session

        at 
org.apache.cloudstack.outofbandmanagement.OutOfBandManagementServiceImpl.executePowerOperation(OutOfBandManagementServiceImpl.java:423)
        at sun.reflect.GeneratedMethodAccessor225.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        ... 21 more


which begs the question how is this meant to work for a host whose power has 
failed.


If I turn off KVM Host HA and change the ping interval to 30 and ping timeout 
to 2 then the VMs failover to another host within 5 mins.

I understand what Host HA is meant for but it seems for a failed host in terms 
of power it doesn't work.

Jon

Reply via email to