andrijapanicsb opened a new issue, #13376:
URL: https://github.com/apache/cloudstack/issues/13376
# Host-HA never marks a powered-off KVM host `Down` because the fence (OOBM
power-off) can't succeed against an already-off chassis — VM-HA only triggers
once the dead host is powered back on
## ISSUE TYPE
- Bug Report
## COMPONENT NAME
~~~
HA (host-HA framework), Out-of-band Management (Redfish/IPMI), KVM
~~~
## CLOUDSTACK VERSION
~~~
Confirmed present with identical (or functionally identical) logic on:
- tag 4.22.1.0 (analyzed in detail)
- branch 4.22 (origin/4.22 @ 21b2025c) — all key files byte-identical
to 4.22.1.0
- branch main (origin/main @ 6bc83a3c) — all key files byte-identical
to 4.22.1.0
- branch 4.20 (origin/4.20 @ a3970bb1) — same logic; differences are
cosmetic only
(method rename getHostStatus() ->
getHostStatusFromHAConfig(); logger formatting)
There is NO 4.21 release branch upstream (release branches go 4.20 -> 4.22).
The host-HA + OOBM fence design predates 4.20, so earlier 4.x releases are
very likely affected too.
Per-branch verification of the relevant elements:
- KVMHAProvider.fence() = OOBM PowerOperation.OFF, returns
resp.getSuccess(): same on 4.20 / 4.22 / main
- FenceTask: only transitions to Fenced on success; retries Fencing
otherwise: byte-identical on 4.20 / 4.22 / main
- HAManagerImpl host-status mapping (Fenced->Down, Fencing->Disconnected):
same on 4.20 (getHostStatus) and 4.22/main (getHostStatusFromHAConfig)
- RedfishWrapper: PowerOperation.OFF -> RedfishResetCmd.GracefulShutdown:
byte-identical on 4.20 / 4.22 / main
- RedfishClient: throws unless HTTP status in 2XX
(SC_OK..SC_MULTIPLE_CHOICES): byte-identical on 4.20 / 4.22 / main
~~~
## CONFIGURATION
- KVM cluster with **host-HA enabled** on the hosts.
- **Out-of-band Management enabled** per host (reproduced with the
**Redfish** driver against Dell iDRAC; the same logic applies to the
**ipmitool** driver).
- VM-HA enabled (`VmHaEnabled`).
- Primary storage: Linstor (not material — `isStorageSupportHA() == true`,
so the legacy investigator is not the bottleneck here).
## OS / ENVIRONMENT
- Management servers: Ubuntu 24.04, OpenJDK 21.
- Hypervisors: KVM.
- BMC: Dell iDRAC via Redfish (`/redfish/v1/Systems/System.Embedded.1`).
## SUMMARY
When a KVM host that has host-HA + OOBM enabled is **hard powered off**
(e.g. forced chassis-off from the BMC console, or a real power/cable failure),
CloudStack **never transitions the host to `Down`** and therefore **never
restarts its VMs on other hosts**. The host stays in `Alert`/`Disconnected`
indefinitely.
Root cause: the host-HA state machine only declares a host dead
(`HAState.Fenced` → investigator `Status.Down`) **after a successful fence**,
and the fence is implemented as an **active OOBM power-off**. Against an
already-off chassis that power-off cannot succeed (the BMC rejects it), so the
host is pinned in the `Fencing` state and retried forever. The investigator
maps `Fencing` to `Status.Disconnected`, not `Status.Down`, so VM-HA is never
invoked.
The perverse result: **the VMs are only recovered once the original (dead)
host is powered back on** — at which point the pending power-off finally
succeeds, the host transitions to `Fenced`/`Down`, and HA restarts the VMs
elsewhere. This defeats the purpose of HA.
**All three current branches are affected by the identical issue:** the
relevant code is byte-identical on `4.22` and `main`, and functionally
identical on `4.20` (only a method rename and logger formatting differ). There
is no `4.21` branch upstream. Per-element diff verification is in the
CLOUDSTACK VERSION section below.
## STEPS TO REPRODUCE
1. KVM cluster, host-HA enabled, OOBM (Redfish or ipmitool) configured and
enabled on the hosts, VM-HA enabled. Place some HA-enabled VMs (incl. system
VMs) on `hostA`.
2. Forcefully power off `hostA` at the BMC (chassis power off / simulate
power loss). The BMC itself stays reachable.
3. Observe `hostA` in CloudStack over the next 20+ minutes.
### EXPECTED RESULTS
- Health check fails → activity check fails → host is fenced → host marked
`Down` → VM-HA restarts `hostA`'s VMs on other hosts within a few minutes.
### ACTUAL RESULTS
- `hostA` remains in `Alert` (host status) with the host-HA state stuck in
`Fencing`.
- The OOBM **STATUS** poll correctly reports the chassis as `Off` the entire
time, but that knowledge is never used to declare the host down.
- The agent investigator repeatedly reports the host as `Up` (while HA state
is `Suspect`) and then `Disconnected` (while HA state is `Fencing`) — **never
`Down`**.
- VMs are **not** restarted; the scheduler keeps preferring the VM's last
host (the dead `hostA`).
- The instant `hostA` is powered back **on**, the fence power-off finally
succeeds → host goes `Down` → VM-HA restarts the VMs on other hosts.
## ROOT CAUSE ANALYSIS
### Decision chain (only `Fenced` yields `Down`)
1. For an HA-eligible KVM host, the legacy investigator delegates to the
host-HA framework:
- `KVMInvestigator.getHostAgentStatus()` →
`haManager.getHostStatusFromHAConfig(host)`
(`plugins/hypervisors/kvm/src/main/java/com/cloud/ha/KVMInvestigator.java:81`)
2. `HAManagerImpl.getHostStatusFromHAConfig()` maps HA state → host status
(`server/src/main/java/org/apache/cloudstack/ha/HAManagerImpl.java:315`):
- `Fenced` → `Status.Down`
- `Degraded` / `Recovering` / `Fencing` → `Status.Disconnected`
- everything else (`Available`/`Suspect`/`Checking`/`Recovered`) →
`Status.Up`
3. `AgentManagerImpl` only fires the `HostDown` event and
`scheduleRestartForVmsOnHost(...)` when the investigator returns `Status.Down`
(`engine/orchestration/src/main/java/com/cloud/agent/manager/AgentManagerImpl.java:1147`,
`:1200`).
So VM-HA for an HA-eligible KVM host requires the host-HA state machine to
reach **`Fenced`**.
### Reaching `Fenced` requires a *successful* power-off
- The state machine only goes `Fencing → Fenced` on `Event.Fenced`
(`api/src/main/java/org/apache/cloudstack/ha/HAConfig.java:139`).
- `FenceTask.processResult()` only fires `Event.Fenced` when the fence
returned `true`; otherwise it does nothing and the poll loop retries `Fencing`
forever via `RetryFencing`
(`server/src/main/java/org/apache/cloudstack/ha/task/FenceTask.java:45`;
retry at
`server/src/main/java/org/apache/cloudstack/ha/HAManagerImpl.java:724`).
- The fence is an active OOBM power-off:
`KVMHAProvider.fence()` →
`outOfBandManagementService.executePowerOperation(host, PowerOperation.OFF,
null)` and returns `resp.getSuccess()`
(`plugins/hypervisors/kvm/src/main/java/org/apache/cloudstack/kvm/ha/KVMHAProvider.java:87`).
- `executePowerOperation()` **throws** `CloudRuntimeException` whenever the
driver response is not successful — it never returns `success=false`
(`server/src/main/java/org/apache/cloudstack/outofbandmanagement/OutOfBandManagementServiceImpl.java:432`).
### Why the power-off fails against an already-off host (Redfish)
- The Redfish driver maps `PowerOperation.OFF` →
`RedfishResetCmd.GracefulShutdown`
(`plugins/outofbandmanagement-drivers/redfish/src/main/java/org/apache/cloudstack/outofbandmanagement/driver/redfish/RedfishWrapper.java:34`).
- `RedfishClient.executeComputerSystemReset()` POSTs to
`.../Actions/ComputerSystem.Reset` and throws `RedfishException` if the HTTP
status is not 2XX
(`utils/src/main/java/org/apache/cloudstack/utils/redfish/RedfishClient.java:300-312`).
- An already-off system returns **HTTP 409 (Conflict)** — a
`GracefulShutdown` is invalid because there is no running OS to shut down. 409
∉ 2XX → `RedfishException` → `CloudRuntimeException` → `HAFenceException` →
`FenceTask` sees `result=false` → **no `Fenced` transition** → stuck in
`Fencing`.
- (The ipmitool driver has the analogous failure mode: `chassis power off`
against an already-off / unreachable BMC returns a non-zero exit code, judged
purely by process exit status with no "already in target state" handling —
`IpmitoolWrapper.executeCommands()` → `result.isSuccess()`.)
### Net effect
The fence requires confirming an **active power-off transition**, but a host
that is already off (precisely the case where restarting its VMs is safe)
cannot be "powered off successfully." The safety mechanism deadlocks in exactly
the scenario it exists to handle. VMs recover only when the dead host returns.
## LOG EVIDENCE (two-MS cluster; host `kvm-host01`, id:1, uuid
`aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee`; hostnames/IPs/VM names below are
anonymized examples)
OOBM STATUS poll knew the chassis was off the whole time (MS #1 log):
~~~
15:10:47 OutOfBandManagementServiceImpl Transitioned out-of-band
management power state from On to Off
due to event: Off(Chassis Power is Off) for Host {id:1, kvm-host01}
~~~
Investigator never returns Down — `Up` while `Suspect`, then `Disconnected`
while `Fencing` (MS #1 log):
~~~
15:07:51 KVMInvestigator was able to determine host {id:1} is in Up
... is considered Up (...). State: Suspect, Most recent health
check failed.
15:14:51 HAManagerImpl HA: Agent [{id:1}] is disconnected. State: Fencing,
The resource is undergoing fence operation.
~~~
The fence itself, on the MS node that owns the HA config (MS #2 log) —
repeated every ~4s for ~20 min:
~~~
15:14:20 (first) ... it got '409'
15:14:28 KVMHAProvider OOBM service is not configured or enabled for this
host {id:1} error is
Failed to execute System power command ... 'POST' ...
'.../Actions/ComputerSystem.Reset' ... The expected HTTP status
code is '2XX' but it got '409'.
15:14:28 FenceTask Exception occurred while running FenceTask ...
org.apache.cloudstack.ha.provider.HAFenceException ... at
KVMHAProvider.fence(KVMHAProvider.java:99)
~~~
Counts over the outage: ~618 × `409`, ~308 × `HAFenceException`, 930 ×
`Fencing` state lines, `Starting HA on ... = 1` (only at the very end).
VM-HA only fires after the host is powered back on (MS #2 log):
~~~
15:35:03 HighAvailabilityManagerExtImpl Scheduling restart for VMs on host
{id:1, kvm-host01}
15:35:03 Host [kvm-host01 (id:1) ...] is down. Starting HA on the
following VMs: vm-app01 vm-app02
~~~
(chassis Off→On detected ~15:35:05 in MS #1 log.)
## SECONDARY BUGS surfaced by this incident
1. **Misleading error message.** Every fence failure logs `OOBM service is
not configured or enabled for this host ...`, but OOBM *is* configured and
working. The catch-all in `KVMHAProvider.fence()`
(`plugins/hypervisors/kvm/src/main/java/org/apache/cloudstack/kvm/ha/KVMHAProvider.java:97-100`)
assumes any exception means "OOBM not configured," hiding the real cause (HTTP
409 / already off). This actively misdirects troubleshooting.
2. **Misleading "fencing performed" alerts.** Each *failed* fence attempt
emits `alertType=30 — "HA Fencing of host id=1 ... performed"` because
`FenceTask.processResult()` calls `sendAlert(resource, HAState.Fencing)`
unconditionally regardless of `result`
(`server/src/main/java/org/apache/cloudstack/ha/task/FenceTask.java:54`).
Admins receive a flood of "fencing performed" alerts while fencing is in fact
failing continuously.
## SUGGESTED FIX (direction)
Make fencing treat "host is already off" as a successful fence, and stop
hiding the real error:
1. In `KVMHAProvider.fence()`, query OOBM power **STATUS** first; if the
chassis is already `Off`, return `true` (host is effectively fenced) instead of
issuing a power-off that 409s. (A confirmed-off host is safe to declare fenced.)
2. Redfish driver: treat an idempotent power-off (target state already
reached, HTTP 409 on `GracefulShutdown`/`ForceOff` when already off) as
success; and/or prefer `ForceOff` over `GracefulShutdown` for the HA fence path.
3. Fix the `fence()` catch block to surface the actual driver error rather
than "OOBM not configured."
4. Make `FenceTask` alerts reflect actual success/failure of the fence.
## NOTES
- Analyzed against git tag `4.22.1.0`.
- Storage (Linstor) is not the bottleneck:
`LinstorPrimaryDataStoreDriverImpl.isStorageSupportHA()` returns `true`, so the
legacy KVM investigator does not short-circuit; the host-HA framework path
(above) is in effect.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]