Description of the problem, imagine the following: We get the CRM command to migrate 'vm:100' from A to B. Now the migration fails, now normally we would get placed in the started state on the source node A from the CRM when it processes our result. But if the CRM didn't processed our result before we start a new 'manage_resources' round (we do that about all ~ 5 seconds) then it could be that the LRM restarts a migration try with the CRM not knowing anything and worse the CRM may process the result of the failed migration try at the same time and place it to started on node A while the LRM now successfully migrated the service to B with the second (hidden) try. Now the state is out of sync:
CRM has the service marked as started on node A but it runs on node B. We (currently) have no way to fixup a wrong node location of a _running_ service, thus the LRM from node A errors in EWRONG_NODE and the CRM places the service in the error state. To fix that we _never_ execute two exactly same migrate commands after each other, exactly means the sid and the target are the same. Signed-off-by: Thomas Lamprecht <t.lampre...@proxmox.com> --- src/PVE/HA/LRM.pm | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm index f53f26d..060ae9d 100644 --- a/src/PVE/HA/LRM.pm +++ b/src/PVE/HA/LRM.pm @@ -457,6 +457,14 @@ sub queue_resource_command { if (my $w = $self->{workers}->{$sid}) { return if $w->{pid}; # already started + if ($state eq 'migrate' && $w->{state} eq $state && $w->{target} eq $target) { + # ignore two identical migration tries directly after each other + # as this means that the CRM didn't got our result yet and a + # second double migration tries are dangerous (EWRONG_NODE)! + $self->{haenv}->log('notice', "Ignore second identical migration call," . + " CRM didn't processed our last result yet."); + return; + } # else, delete and overwrite queue entry with new command delete $self->{workers}->{$sid}; } -- 2.1.4 _______________________________________________ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel