If all services in 'fence' state are gone from a node (e.g. by removing the services) before fence_node() was successful, a node would get stuck in the 'fence' state. Avoid this by calling fence_node() if the node is in 'fence' state, regardless of service state.
Reported in the community forum: https://forum.proxmox.com/threads/ha-migration-stuck-is-doing-nothing.94469/ Signed-off-by: Fabian Ebner <f.eb...@proxmox.com> --- Not really sure if this is worth it, because it's a hard to reach edge case, but AFAICT there is no good way to get out of being stuck. What would work is either of: * Manually correcting the node state. * Adding a service to the stuck node and triggering a fence situation. An alternative would be to keep services in 'fence' state in the manager state, even if they were removed from the config. But the approach from this patch seemed a bit more robust: for example, it will fix an already existing stuck state, rather than just avoid creating one. src/PVE/HA/Manager.pm | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm index 1c66b43..fc445b1 100644 --- a/src/PVE/HA/Manager.pm +++ b/src/PVE/HA/Manager.pm @@ -472,6 +472,14 @@ sub manage { $repeat = 1; # for faster execution } + # Avoid that a node without services in 'fence' state gets stuck in 'fence' state. + for my $node (sort keys $ns->{status}->%*) { + next if $ns->get_node_state($node) ne 'fence'; + next if defined($fenced_nodes->{$node}); + + $fenced_nodes->{$node} = $ns->fence_node($node) || 0; + } + last if !$repeat; } -- 2.30.2 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel