[ https://issues.apache.org/jira/browse/IGNITE-19087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kirill Gusakov updated IGNITE-19087: ------------------------------------ Description: (was: Sometimes we must cancel the ongoing rebalance: * We can receive an unrecoverable error from replication group during the current rebalance * We can decide to cancel it manually [!https://github.com/apache/ignite-3/raw/c276a334c4c3742520494bccdc957f4530c8ed7a/modules/distribution-zones/tech-notes/images/cancelRebalance.svg!|https://github.com/apache/ignite-3/blob/c276a334c4c3742520494bccdc957f4530c8ed7a/modules/distribution-zones/tech-notes/images/cancelRebalance.svg] h3. 1. Put rebalance intent to *.cancel key For the purpose of persisting for cancel intent, we must save the (oldTopology, newTopology) pair of peers lists to {{zoneId.assignment.cancel}} key. Also, every invoke with update of {{*.cancel}} key must be enriched by revision of the pending key, which must be cancelled: {{ if(zoneId.assignment.pending.revision == inputRevision): zoneId.assignment.cancel = cancelValue return true else: return false}} It's needed to prevent the race, between the rebalance done and cancel persisting, otherwise we can try to cancel the wrong rebalance process. h3. [|https://github.com/apache/ignite-3/blob/c276a334c4c3742520494bccdc957f4530c8ed7a/modules/distribution-zones/tech-notes/rebalance.md#2-primaryreplica-replicationgroup-cancel-protocol] h3. 2. PrimaryReplica->ReplicationGroup cancel protocol When PrimaryReplica send {{CancelRebalanceRequest(oldTopology, newTopology)}} to the ReplicationGroup following cases are possible: * Replication group has ongoing rebalance oldTopology->newTopology. So, it must be cancelled and cleanup for the configuration state of replication group to oldTopology must be executed. * Replication group has no ongoing rebalance and currentTopology==oldTopology. So, nothing to cancel, return success response. * Replication group has no ongoing rebalance and currentTopology==newTopology. So, cancel request can't be executed, return the response about it. Result recipient of this response (placement driver) must log this fact and do the same routine for usual rebalanceDone.) > Cancel rebalance mechanism > -------------------------- > > Key: IGNITE-19087 > URL: https://issues.apache.org/jira/browse/IGNITE-19087 > Project: Ignite > Issue Type: Task > Reporter: Kirill Gusakov > Priority: Major > Labels: ignite-3 > -- This message was sent by Atlassian Jira (v8.20.10#820010)