On 08.10.21 14:52, Fabian Ebner wrote: > If all services in 'fence' state are gone from a node (e.g. by > removing the services) before fence_node() was successful, a node > would get stuck in the 'fence' state. Avoid this by calling > fence_node() if the node is in 'fence' state, regardless of service > state. > > Reported in the community forum: > https://forum.proxmox.com/threads/ha-migration-stuck-is-doing-nothing.94469/ > > Signed-off-by: Fabian Ebner <f.eb...@proxmox.com> > --- > > Not really sure if this is worth it, because it's a hard to reach edge > case, but AFAICT there is no good way to get out of being stuck. What > would work is either of: > * Manually correcting the node state. > * Adding a service to the stuck node and triggering a fence > situation. > > An alternative would be to keep services in 'fence' state in the > manager state, even if they were removed from the config. But the > approach from this patch seemed a bit more robust: for example, it > will fix an already existing stuck state, rather than just avoid > creating one. > > src/PVE/HA/Manager.pm | 8 ++++++++ > 1 file changed, 8 insertions(+) > >
applied, thanks! As also discussed off-list I noticed a related issue to a derived edge-case, that could cause trouble too. Spent some time in coming up with two tests covering your fixed situation plus also mine, expanding the capabilities of the test/simulation system slightly. https://git.proxmox.com/?p=pve-ha-manager.git;a=commit;h=ca2e547a7662467f9a08c54fa15b46825e3702e6 https://git.proxmox.com/?p=pve-ha-manager.git;a=commit;h=30fc7ceedb7f3047659f22d063cc16c94c20dd7a _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel