[ 
https://issues.apache.org/jira/browse/IGNITE-19087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Gusakov updated IGNITE-19087:
------------------------------------
    Description:     (was: Sometimes we must cancel the ongoing rebalance:
 * We can receive an unrecoverable error from replication group during the 
current rebalance
 * We can decide to cancel it manually

[!https://github.com/apache/ignite-3/raw/c276a334c4c3742520494bccdc957f4530c8ed7a/modules/distribution-zones/tech-notes/images/cancelRebalance.svg!|https://github.com/apache/ignite-3/blob/c276a334c4c3742520494bccdc957f4530c8ed7a/modules/distribution-zones/tech-notes/images/cancelRebalance.svg]

 

 

 
h3. 1. Put rebalance intent to *.cancel key

For the purpose of persisting for cancel intent, we must save the (oldTopology, 
newTopology) pair of peers lists to {{zoneId.assignment.cancel}} key. Also, 
every invoke with update of {{*.cancel}} key must be enriched by revision of 
the pending key, which must be cancelled:
 {{    if(zoneId.assignment.pending.revision == inputRevision):
        zoneId.assignment.cancel = cancelValue
        return true
    else:
        return false}}
It's needed to prevent the race, between the rebalance done and cancel 
persisting, otherwise we can try to cancel the wrong rebalance process.
h3. 
[|https://github.com/apache/ignite-3/blob/c276a334c4c3742520494bccdc957f4530c8ed7a/modules/distribution-zones/tech-notes/rebalance.md#2-primaryreplica-replicationgroup-cancel-protocol]
h3. 2. PrimaryReplica->ReplicationGroup cancel protocol

When PrimaryReplica send {{CancelRebalanceRequest(oldTopology, newTopology)}} 
to the ReplicationGroup following cases are possible:
 * Replication group has ongoing rebalance oldTopology->newTopology. So, it 
must be cancelled and cleanup for the configuration state of replication group 
to oldTopology must be executed.
 * Replication group has no ongoing rebalance and currentTopology==oldTopology. 
So, nothing to cancel, return success response.
 * Replication group has no ongoing rebalance and currentTopology==newTopology. 
So, cancel request can't be executed, return the response about it. Result 
recipient of this response (placement driver) must log this fact and do the 
same routine for usual rebalanceDone.)

> Cancel rebalance mechanism
> --------------------------
>
>                 Key: IGNITE-19087
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19087
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Kirill Gusakov
>            Priority: Major
>              Labels: ignite-3
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to