[pve-devel] [PATCH ha-manager v5 1/3] avoid out of sync command execution in LRM

Thomas Lamprecht Tue, 23 Feb 2016 23:35:27 -0800

We are only allowed to execute any command once as else we
may disturb the synchrony between CRM and LRM.


The following scenario could happen:
schedule CRM: deploy task 'migrate' for service vm:100 with UID 1234
schedule LRM: fork task wit UID 123
schedule CRM: idle as no result available yet
schedule LRM: collect finished task UID and write result for CRM
              Result was an error
schedule LRM: fork task UID _AGAIN_
schedule CRM: processes _previous_ erroneous result from LRM
              and place vm:100 in started state on source node
schedule LRM: collect finished task UID and write result for CRM
              This time the task was successful and service is on
              target node, but the CRM know _nothing_ of it!
schedule LRM: try to schedule task but service is not on this node!
=> error

To fix that we _never_ execute two exactly same commands after each
other, exactly means here that the UID of the actual command to
queue is already a valid result.

This enforces the originally wanted SYN - ACK behaviour between CRM
and LRMs.

We generate now a new UID for services who does not change state
the one of the following evaluates to true:
* disabled AND in stopped state
* enabled  AND in started state

This ensures that the state from the CRM holds in the LRM and thus,
for example, a killed VM gets restarted.

Signed-off-by: Thomas Lamprecht <t.lampre...@proxmox.com>
---

changes since v4:
* do not introduce proccessed_uids, use result hash instead
* rebase on current master

 src/PVE/HA/LRM.pm                          |  6 ++++++
 src/PVE/HA/Manager.pm                      | 14 +++++++++-----
 src/test/test-resource-failure5/log.expect |  2 --
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm
index 7bbfe46..c0dc8f4 100644
--- a/src/PVE/HA/LRM.pm
+++ b/src/PVE/HA/LRM.pm
@@ -455,6 +455,12 @@ sub manage_resources {
 sub queue_resource_command {
     my ($self, $sid, $uid, $state, $target) = @_;
 
+    # do not queue the excatly same command twice as this may lead to
+    # an inconsistent HA state when the first command fails but the CRM
+    # does not process its failure right away and the LRM starts a second
+    # try, without the CRM knowing of it (race condition)
+    return if $uid && defined($self->{results}->{$uid});
+
     if (my $w = $self->{workers}->{$sid}) {
        return if $w->{pid}; # already started
        # else, delete and overwrite queue entry with new command
diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm
index 32b0ad7..321b47c 100644
--- a/src/PVE/HA/Manager.pm
+++ b/src/PVE/HA/Manager.pm
@@ -517,12 +517,14 @@ sub next_state_stopped {
        } else {
            $haenv->log('err', "unknown command '$cmd' for service '$sid'"); 
        }
-    } 
+    }
 
     if ($cd->{state} eq 'disabled') {
-       # do nothing
+       # tell LRM we are in a synced state
+       # ensure that it stays stopped
+       $sd->{uid} = compute_new_uuid($sd->{state});
        return;
-    } 
+    }
 
     if ($cd->{state} eq 'enabled') {
        # simply mark it started, if it's on the wrong node
@@ -608,6 +610,7 @@ sub next_state_started {
                                " (exit code $ec))");
                    # we have no save way out (yet) for other errors
                    &$change_service_state($self, $sid, 'error');
+                   return;
                }
            }
 
@@ -623,12 +626,13 @@ sub next_state_started {
                    &$change_service_state($self, $sid, 'relocate', node => 
$sd->{node}, target => $node);
                }
            } else {
-               # do nothing
+               # ensure service get started again if it went unexpected down
+               $sd->{uid} = compute_new_uuid($sd->{state});
            }
        }
 
        return;
-    } 
+    }
 
     $haenv->log('err', "service '$sid' - unknown state '$cd->{state}' in 
service configuration");
 }
diff --git a/src/test/test-resource-failure5/log.expect 
b/src/test/test-resource-failure5/log.expect
index b6e7807..283ca8c 100644
--- a/src/test/test-resource-failure5/log.expect
+++ b/src/test/test-resource-failure5/log.expect
@@ -31,8 +31,6 @@ err     143    node2/lrm: unable to start service fa:130 on 
local node after 1 r
 err     160    node1/crm: recovery policy for service fa:130 failed, entering 
error state!
 info    160    node1/crm: service 'fa:130': state changed from 'started' to 
'error'
 warn    163    node2/lrm: service fa:130 is not running and in an error state
-warn    183    node2/lrm: service fa:130 is not running and in an error state
-warn    203    node2/lrm: service fa:130 is not running and in an error state
 info    220      cmdlist: execute service fa:130 disabled
 info    220    node1/crm: service 'fa:130': state changed from 'error' to 
'stopped'
 info    820     hardware: exit simulation - done
-- 
2.1.4


_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

[pve-devel] [PATCH ha-manager v5 1/3] avoid out of sync command execution in LRM

Reply via email to