In the event that the cluster remains stable from the HA view point (i.e. no LRM failures, no service changes) and a service with a start error relocate policy configured fails to start it will cycle between two nodes back and forth, even if other untried nodes (with a possibility that the service could be started there) would be available. The reason for this behavior is that if more than one node is possible we select the one with the lowest active service count to minimize cluster impact a bit.
As a start of a service failed on a node a short time ago it probably will also fail the next try (e.g. storage is offline), whereas an untried node may have the chance to be fully able to start the service, which is our goal. Fix that by excluding those already tried nodes from the top priority node list in 'select_service_node' if there are other possible nodes to try, we do that by giving select_service_node a array of the already tried nodes, those then get deleted from the selected top priority group. If there is no node left after that we retry on the current node and hope that another node becomes available. While not ideal this a situation caused by the user with a high probability as our default start error relocation setting is one relocate try. If all tries fail we place the service in the error state, the tried nodes entry gets cleanup after an user triggers an error recovery by disabling the service, so the information of the tried nodes stays in the manager status until then. Signed-off-by: Thomas Lamprecht <t.lampre...@proxmox.com> --- changes since v3: * in the case no node is left after deleting all tried nodes just try the current one again instead of the quite hacky algorithm. Log that case also, else the user may wonder why it tried the same node again. src/PVE/HA/Manager.pm | 19 +++++-- src/test/test-relocate-policy-default-group/README | 7 +++ .../test-relocate-policy-default-group/cmdlist | 4 ++ .../hardware_status | 5 ++ .../test-relocate-policy-default-group/log.expect | 53 ++++++++++++++++++ .../manager_status | 1 + .../service_config | 3 + src/test/test-relocate-policy1/README | 4 ++ src/test/test-relocate-policy1/cmdlist | 4 ++ src/test/test-relocate-policy1/hardware_status | 5 ++ src/test/test-relocate-policy1/log.expect | 64 ++++++++++++++++++++++ src/test/test-relocate-policy1/manager_status | 42 ++++++++++++++ src/test/test-relocate-policy1/service_config | 9 +++ src/test/test-resource-failure6/log.expect | 55 +++++++++++++++++++ 14 files changed, 271 insertions(+), 4 deletions(-) create mode 100644 src/test/test-relocate-policy-default-group/README create mode 100644 src/test/test-relocate-policy-default-group/cmdlist create mode 100644 src/test/test-relocate-policy-default-group/hardware_status create mode 100644 src/test/test-relocate-policy-default-group/log.expect create mode 100644 src/test/test-relocate-policy-default-group/manager_status create mode 100644 src/test/test-relocate-policy-default-group/service_config create mode 100644 src/test/test-relocate-policy1/README create mode 100644 src/test/test-relocate-policy1/cmdlist create mode 100644 src/test/test-relocate-policy1/hardware_status create mode 100644 src/test/test-relocate-policy1/log.expect create mode 100644 src/test/test-relocate-policy1/manager_status create mode 100644 src/test/test-relocate-policy1/service_config create mode 100644 src/test/test-resource-failure6/log.expect diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm index 3f28094..6f6fdab 100644 --- a/src/PVE/HA/Manager.pm +++ b/src/PVE/HA/Manager.pm @@ -71,7 +71,7 @@ sub flush_master_status { } sub select_service_node { - my ($groups, $online_node_usage, $service_conf, $current_node, $try_next) = @_; + my ($groups, $online_node_usage, $service_conf, $current_node, $try_next, $tried_nodes) = @_; my $group = {}; # add all online nodes to default group to allow try_next when no group set @@ -106,7 +106,6 @@ sub select_service_node { } } - my @pri_list = sort {$b <=> $a} keys %$pri_groups; return undef if !scalar(@pri_list); @@ -119,6 +118,13 @@ sub select_service_node { my $top_pri = $pri_list[0]; + # try to avoid nodes where the service failed already if we want to relocate + if ($try_next) { + foreach my $node (@$tried_nodes) { + delete $pri_groups->{$top_pri}->{$node}; + } + } + my @nodes = sort { $online_node_usage->{$a} <=> $online_node_usage->{$b} || $a cmp $b } keys %{$pri_groups->{$top_pri}}; @@ -661,8 +667,8 @@ sub next_state_started { } } - my $node = select_service_node($self->{groups}, $self->{online_node_usage}, - $cd, $sd->{node}, $try_next); + my $node = select_service_node($self->{groups}, $self->{online_node_usage}, + $cd, $sd->{node}, $try_next, $sd->{failed_nodes}); if ($node && ($sd->{node} ne $node)) { if ($cd->{type} eq 'vm') { @@ -673,6 +679,11 @@ sub next_state_started { &$change_service_state($self, $sid, 'relocate', node => $sd->{node}, target => $node); } } else { + if ($try_next && !defined($node)) { + $haenv->log('warning', "Start Error Recovery: Tried all available " . + " nodes for service '$sid', retry start on current node. " . + "Tried nodes: " . join(', ', @{$sd->{failed_nodes}})); + } # ensure service get started again if it went unexpected down $sd->{uid} = compute_new_uuid($sd->{state}); } diff --git a/src/test/test-relocate-policy-default-group/README b/src/test/test-relocate-policy-default-group/README new file mode 100644 index 0000000..18ee13a --- /dev/null +++ b/src/test/test-relocate-policy-default-group/README @@ -0,0 +1,7 @@ +Test relocate policy on services with no group. +Service 'fa:130' fails three times to restart and has a 'max_restart' policy +of 0, thus will be relocated after each start try. +As it has no group configured all available nodes should get chosen for +when relocating. +As we allow to relocate twice but the service fails three times we place +it in the error state after all tries where used and all nodes where visited diff --git a/src/test/test-relocate-policy-default-group/cmdlist b/src/test/test-relocate-policy-default-group/cmdlist new file mode 100644 index 0000000..8f06508 --- /dev/null +++ b/src/test/test-relocate-policy-default-group/cmdlist @@ -0,0 +1,4 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on"], + [ "service fa:130 enabled" ] +] diff --git a/src/test/test-relocate-policy-default-group/hardware_status b/src/test/test-relocate-policy-default-group/hardware_status new file mode 100644 index 0000000..451beb1 --- /dev/null +++ b/src/test/test-relocate-policy-default-group/hardware_status @@ -0,0 +1,5 @@ +{ + "node1": { "power": "off", "network": "off" }, + "node2": { "power": "off", "network": "off" }, + "node3": { "power": "off", "network": "off" } +} diff --git a/src/test/test-relocate-policy-default-group/log.expect b/src/test/test-relocate-policy-default-group/log.expect new file mode 100644 index 0000000..a7dd644 --- /dev/null +++ b/src/test/test-relocate-policy-default-group/log.expect @@ -0,0 +1,53 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online' +info 20 node1/crm: adding new service 'fa:130' on node 'node2' +info 20 node1/crm: service 'fa:130': state changed from 'started' to 'request_stop' +info 22 node2/crm: status change wait_for_quorum => slave +info 23 node2/lrm: got lock 'ha_agent_node2_lock' +info 23 node2/lrm: status change wait_for_agent_lock => active +info 24 node3/crm: status change wait_for_quorum => slave +info 40 node1/crm: service 'fa:130': state changed from 'request_stop' to 'stopped' +info 120 cmdlist: execute service fa:130 enabled +info 120 node1/crm: service 'fa:130': state changed from 'stopped' to 'started' (node = node2) +info 123 node2/lrm: starting service fa:130 +warn 123 node2/lrm: unable to start service fa:130 +err 123 node2/lrm: unable to start service fa:130 on local node after 0 retries +warn 140 node1/crm: starting service fa:130 on node 'node2' failed, relocating service. +info 140 node1/crm: relocate service 'fa:130' to node 'node1' +info 140 node1/crm: service 'fa:130': state changed from 'started' to 'relocate' (node = node2, target = node1) +info 143 node2/lrm: service fa:130 - start relocate to node 'node1' +info 143 node2/lrm: service fa:130 - end relocate to node 'node1' +info 160 node1/crm: service 'fa:130': state changed from 'relocate' to 'started' (node = node1) +info 161 node1/lrm: got lock 'ha_agent_node1_lock' +info 161 node1/lrm: status change wait_for_agent_lock => active +info 161 node1/lrm: starting service fa:130 +warn 161 node1/lrm: unable to start service fa:130 +err 161 node1/lrm: unable to start service fa:130 on local node after 0 retries +warn 180 node1/crm: starting service fa:130 on node 'node1' failed, relocating service. +info 180 node1/crm: relocate service 'fa:130' to node 'node3' +info 180 node1/crm: service 'fa:130': state changed from 'started' to 'relocate' (node = node1, target = node3) +info 181 node1/lrm: service fa:130 - start relocate to node 'node3' +info 181 node1/lrm: service fa:130 - end relocate to node 'node3' +info 200 node1/crm: service 'fa:130': state changed from 'relocate' to 'started' (node = node3) +info 205 node3/lrm: got lock 'ha_agent_node3_lock' +info 205 node3/lrm: status change wait_for_agent_lock => active +info 205 node3/lrm: starting service fa:130 +warn 205 node3/lrm: unable to start service fa:130 +err 205 node3/lrm: unable to start service fa:130 on local node after 0 retries +err 220 node1/crm: recovery policy for service fa:130 failed, entering error state. Failed nodes: node2, node1, node3 +info 220 node1/crm: service 'fa:130': state changed from 'started' to 'error' +err 225 node3/lrm: service fa:130 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation. +info 720 hardware: exit simulation - done diff --git a/src/test/test-relocate-policy-default-group/manager_status b/src/test/test-relocate-policy-default-group/manager_status new file mode 100644 index 0000000..0967ef4 --- /dev/null +++ b/src/test/test-relocate-policy-default-group/manager_status @@ -0,0 +1 @@ +{} diff --git a/src/test/test-relocate-policy-default-group/service_config b/src/test/test-relocate-policy-default-group/service_config new file mode 100644 index 0000000..c3cc873 --- /dev/null +++ b/src/test/test-relocate-policy-default-group/service_config @@ -0,0 +1,3 @@ +{ + "fa:130": { "node": "node2", "max_restart": "0", "max_relocate": "2" } +} diff --git a/src/test/test-relocate-policy1/README b/src/test/test-relocate-policy1/README new file mode 100644 index 0000000..f0f12fd --- /dev/null +++ b/src/test/test-relocate-policy1/README @@ -0,0 +1,4 @@ +Test if relocate policy selects the lowest populated node in addition to +only those which weren't tried yet. +As node 1 has the most services it should get selected as last even if its +name sorts before the other ones. diff --git a/src/test/test-relocate-policy1/cmdlist b/src/test/test-relocate-policy1/cmdlist new file mode 100644 index 0000000..d253427 --- /dev/null +++ b/src/test/test-relocate-policy1/cmdlist @@ -0,0 +1,4 @@ +[ + [ "power node1 on", "power node2 on", "power node3 on" ], + [ "service fa:130 enabled" ] +] diff --git a/src/test/test-relocate-policy1/hardware_status b/src/test/test-relocate-policy1/hardware_status new file mode 100644 index 0000000..451beb1 --- /dev/null +++ b/src/test/test-relocate-policy1/hardware_status @@ -0,0 +1,5 @@ +{ + "node1": { "power": "off", "network": "off" }, + "node2": { "power": "off", "network": "off" }, + "node3": { "power": "off", "network": "off" } +} diff --git a/src/test/test-relocate-policy1/log.expect b/src/test/test-relocate-policy1/log.expect new file mode 100644 index 0000000..9859f6d --- /dev/null +++ b/src/test/test-relocate-policy1/log.expect @@ -0,0 +1,64 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: adding new service 'fa:130' on node 'node3' +info 20 node1/crm: service 'fa:130': state changed from 'started' to 'request_stop' +info 21 node1/lrm: got lock 'ha_agent_node1_lock' +info 21 node1/lrm: status change wait_for_agent_lock => active +info 21 node1/lrm: starting service vm:100 +info 21 node1/lrm: service status vm:100 started +info 21 node1/lrm: starting service vm:101 +info 21 node1/lrm: service status vm:101 started +info 21 node1/lrm: starting service vm:102 +info 21 node1/lrm: service status vm:102 started +info 22 node2/crm: status change wait_for_quorum => slave +info 23 node2/lrm: got lock 'ha_agent_node2_lock' +info 23 node2/lrm: status change wait_for_agent_lock => active +info 23 node2/lrm: starting service vm:103 +info 23 node2/lrm: service status vm:103 started +info 23 node2/lrm: starting service vm:104 +info 23 node2/lrm: service status vm:104 started +info 24 node3/crm: status change wait_for_quorum => slave +info 25 node3/lrm: got lock 'ha_agent_node3_lock' +info 25 node3/lrm: status change wait_for_agent_lock => active +info 25 node3/lrm: starting service vm:105 +info 25 node3/lrm: service status vm:105 started +info 40 node1/crm: service 'fa:130': state changed from 'request_stop' to 'stopped' +info 120 cmdlist: execute service fa:130 enabled +info 120 node1/crm: service 'fa:130': state changed from 'stopped' to 'started' (node = node3) +info 125 node3/lrm: starting service fa:130 +warn 125 node3/lrm: unable to start service fa:130 +err 125 node3/lrm: unable to start service fa:130 on local node after 0 retries +warn 140 node1/crm: starting service fa:130 on node 'node3' failed, relocating service. +info 140 node1/crm: relocate service 'fa:130' to node 'node2' +info 140 node1/crm: service 'fa:130': state changed from 'started' to 'relocate' (node = node3, target = node2) +info 145 node3/lrm: service fa:130 - start relocate to node 'node2' +info 145 node3/lrm: service fa:130 - end relocate to node 'node2' +info 160 node1/crm: service 'fa:130': state changed from 'relocate' to 'started' (node = node2) +info 163 node2/lrm: starting service fa:130 +warn 163 node2/lrm: unable to start service fa:130 +err 163 node2/lrm: unable to start service fa:130 on local node after 0 retries +warn 180 node1/crm: starting service fa:130 on node 'node2' failed, relocating service. +info 180 node1/crm: relocate service 'fa:130' to node 'node1' +info 180 node1/crm: service 'fa:130': state changed from 'started' to 'relocate' (node = node2, target = node1) +info 183 node2/lrm: service fa:130 - start relocate to node 'node1' +info 183 node2/lrm: service fa:130 - end relocate to node 'node1' +info 200 node1/crm: service 'fa:130': state changed from 'relocate' to 'started' (node = node1) +info 201 node1/lrm: starting service fa:130 +warn 201 node1/lrm: unable to start service fa:130 +err 201 node1/lrm: unable to start service fa:130 on local node after 0 retries +warn 220 node1/crm: starting service fa:130 on node 'node1' failed, relocating service. +warn 220 node1/crm: Start Error Recovery: Tried all available nodes for service 'fa:130', retry start on current node. Tried nodes: node3, node2, node1 +info 221 node1/lrm: starting service fa:130 +info 221 node1/lrm: service status fa:130 started +info 240 node1/crm: relocation policy successful for 'fa:130', failed nodes: node3, node2, node1 +info 720 hardware: exit simulation - done diff --git a/src/test/test-relocate-policy1/manager_status b/src/test/test-relocate-policy1/manager_status new file mode 100644 index 0000000..8cce913 --- /dev/null +++ b/src/test/test-relocate-policy1/manager_status @@ -0,0 +1,42 @@ +{ + "master_node": "node1", + "node_status": { + "node1": "online", + "node2": "online", + "node3": "online" + }, + "relocate_tried_nodes": {}, + "service_status": { + "vm:100": { + "node": "node1", + "state": "started", + "uid": "hSIUPNL/lBjgyU4svobXlg" + }, + "vm:101": { + "node": "node1", + "state": "started", + "uid": "vLuiMIZ5KBKzDZv2bkYLvA" + }, + "vm:102": { + "node": "node1", + "state": "started", + "uid": "COPzO9cc+8Z3lUbWn8zCHA" + }, + "vm:103": { + "node": "node2", + "state": "started", + "uid": "iktXhI6tCi8X6h8wQS9Uyw" + }, + "vm:104": { + "node": "node2", + "state": "started", + "uid": "ySWup2on+tY88hdfzS1ymg" + }, + "vm:105": { + "node": "node3", + "state": "started", + "uid": "RGRR9EOAzALG5cVMeWiKWA" + } + }, + "timestamp": 10 +} diff --git a/src/test/test-relocate-policy1/service_config b/src/test/test-relocate-policy1/service_config new file mode 100644 index 0000000..d9f1823 --- /dev/null +++ b/src/test/test-relocate-policy1/service_config @@ -0,0 +1,9 @@ +{ + "vm:100": { "node": "node1", "state": "enabled" }, + "vm:101": { "node": "node1", "state": "enabled" }, + "vm:102": { "node": "node1", "state": "enabled" }, + "vm:103": { "node": "node2", "state": "enabled" }, + "vm:104": { "node": "node2", "state": "enabled" }, + "vm:105": { "node": "node3", "state": "enabled" }, + "fa:130": { "node": "node3", "max_restart": "0", "max_relocate": "3" } +} diff --git a/src/test/test-resource-failure6/log.expect b/src/test/test-resource-failure6/log.expect new file mode 100644 index 0000000..281a4ba --- /dev/null +++ b/src/test/test-resource-failure6/log.expect @@ -0,0 +1,55 @@ +info 0 hardware: starting simulation +info 20 cmdlist: execute power node1 on +info 20 node1/crm: status change startup => wait_for_quorum +info 20 node1/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node2 on +info 20 node2/crm: status change startup => wait_for_quorum +info 20 node2/lrm: status change startup => wait_for_agent_lock +info 20 cmdlist: execute power node3 on +info 20 node3/crm: status change startup => wait_for_quorum +info 20 node3/lrm: status change startup => wait_for_agent_lock +info 20 node1/crm: got lock 'ha_manager_lock' +info 20 node1/crm: status change wait_for_quorum => master +info 20 node1/crm: node 'node1': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node2': state changed from 'unknown' => 'online' +info 20 node1/crm: node 'node3': state changed from 'unknown' => 'online' +info 20 node1/crm: adding new service 'fa:130' on node 'node2' +info 20 node1/crm: service 'fa:130': state changed from 'started' to 'request_stop' +info 22 node2/crm: status change wait_for_quorum => slave +info 23 node2/lrm: got lock 'ha_agent_node2_lock' +info 23 node2/lrm: status change wait_for_agent_lock => active +info 24 node3/crm: status change wait_for_quorum => slave +info 40 node1/crm: service 'fa:130': state changed from 'request_stop' to 'stopped' +info 120 cmdlist: execute service fa:130 enabled +info 120 node1/crm: service 'fa:130': state changed from 'stopped' to 'started' (node = node2) +info 123 node2/lrm: starting service fa:130 +warn 123 node2/lrm: unable to start service fa:130 +err 123 node2/lrm: unable to start service fa:130 on local node after 0 retries +warn 140 node1/crm: starting service fa:130 on node 'node2' failed, relocating service. +info 140 node1/crm: relocate service 'fa:130' to node 'node1' +info 140 node1/crm: service 'fa:130': state changed from 'started' to 'relocate' (node = node2, target = node1) +info 143 node2/lrm: service fa:130 - start relocate to node 'node1' +info 143 node2/lrm: service fa:130 - end relocate to node 'node1' +info 160 node1/crm: service 'fa:130': state changed from 'relocate' to 'started' (node = node1) +info 161 node1/lrm: got lock 'ha_agent_node1_lock' +info 161 node1/lrm: status change wait_for_agent_lock => active +info 161 node1/lrm: starting service fa:130 +warn 161 node1/lrm: unable to start service fa:130 +err 161 node1/lrm: unable to start service fa:130 on local node after 0 retries +warn 180 node1/crm: starting service fa:130 on node 'node1' failed, relocating service. +info 180 node1/crm: relocate service 'fa:130' to node 'node3' +info 180 node1/crm: service 'fa:130': state changed from 'started' to 'relocate' (node = node1, target = node3) +info 181 node1/lrm: service fa:130 - start relocate to node 'node3' +info 181 node1/lrm: service fa:130 - end relocate to node 'node3' +info 200 node1/crm: service 'fa:130': state changed from 'relocate' to 'started' (node = node3) +info 205 node3/lrm: got lock 'ha_agent_node3_lock' +info 205 node3/lrm: status change wait_for_agent_lock => active +info 205 node3/lrm: starting service fa:130 +warn 205 node3/lrm: unable to start service fa:130 +err 205 node3/lrm: unable to start service fa:130 on local node after 0 retries +warn 220 node1/crm: starting service fa:130 on node 'node3' failed, relocating service. +warn 220 node1/crm: Start Error Recovery: Tried all available nodes for service 'fa:130', retry start on current node. Tried nodes: node2, node1, node3 +info 225 node3/lrm: starting service fa:130 +info 225 node3/lrm: service status fa:130 started +info 240 node1/crm: relocation policy successful for 'fa:130', failed nodes: node2, node1, node3 +info 720 hardware: exit simulation - done -- 2.1.4 _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel