[ https://issues.apache.org/jira/browse/IGNITE-23572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mirza Aliev updated IGNITE-23572: --------------------------------- Description: h3. Motivation [IEP-131|https://cwiki.apache.org/confluence/display/IGNITE/IEP-131%3A+Partition+Majority+Unavailability+Handling] In HA mode for scale up situations: # Let’s say we have [A, B, C] for the partition assignments, B and C left. # Raft group was narrowed in force manner to [A], and after that node B returned, we must enhance stable to [A, B] # In terms of DZ.scale up, there wasn't any change in DZ.scale up time window, so data nodes will be the same, so it could mean that we don’t need to schedule new rebalance to enhance stable assignment to [A, B], but we actually do need. (Note that DZ.scale down timer is quite big and wasn’t event passed) Proposed enhancements # Data nodes are rewritten on scale up even if they are the same # When we decide if we need to trigger rebalance after data nodes change, we calculate assignments and apply nodes aliveness check filter to those assignments. If we see that the actual stablePartAssignmentsKey differs from the filtered one, we schedule rebalance. # In our example, calculated assignments will be [A, B, C], we will filter them to [A,B] and schedule new rebalance to enhance stablePartAssignmentsKey h3. Definition of done * Corresponding approach must be implemented, so nodes that returned back after majority loss could be returned back to stable was: h3. Motivation [IEP-131|https://cwiki.apache.org/confluence/display/IGNITE/IEP-131%3A+Partition+Majority+Unavailability+Handling] In HA mode for scale up situations: # Let’s say we have [A, B, C] for the partition assignments, B and C left. # Raft group was narrowed in force manner to [A], and after that node B returned, we must enhance stable to [A, B] # In terms of DZ.scale up, there wasn't any change in DZ.scale up time window, so data nodes will be the same, so it could mean that we don’t need to schedule new rebalance to enhance stable assignment to [A, B], but we actually do need. (Note that DZ.scale down timer is quite big and wasn’t event passed) Proposed enhancements # Data nodes are rewritten on scale up even if they are the same # When we decide if we need to trigger rebalance after data nodes change, we calculate assignments and apply nodes aliveness check filter to those assignments. If we see that the actual stablePartAssignmentsKey differs from the filtered one, we schedule rebalance. # In our example, calculated assignments will be [A, B, C], we will filter them to [A,B] and schedule new rebalance to enhance stablePartAssignmentsKey h3. Definition of done * Corresponding approach must be implemented, so nodes that left after majority loss could be returned back to stable > Change rebalance scheduling when data nodes are changed > ------------------------------------------------------- > > Key: IGNITE-23572 > URL: https://issues.apache.org/jira/browse/IGNITE-23572 > Project: Ignite > Issue Type: Improvement > Reporter: Mirza Aliev > Assignee: Mirza Aliev > Priority: Major > Labels: ignite-3 > > h3. Motivation > [IEP-131|https://cwiki.apache.org/confluence/display/IGNITE/IEP-131%3A+Partition+Majority+Unavailability+Handling] > > > In HA mode for scale up situations: > # Let’s say we have [A, B, C] for the partition assignments, B and C left. > # Raft group was narrowed in force manner to [A], and after that node B > returned, we must enhance stable to [A, B] > # In terms of DZ.scale up, there wasn't any change in DZ.scale up time > window, so data nodes will be the same, so it could mean that we don’t need > to schedule new rebalance to enhance stable assignment to [A, B], but we > actually do need. (Note that DZ.scale down timer is quite big and wasn’t > event passed) > Proposed enhancements > # Data nodes are rewritten on scale up even if they are the same > # When we decide if we need to trigger rebalance after data nodes change, we > calculate assignments and apply nodes aliveness check filter to those > assignments. If we see that the actual stablePartAssignmentsKey differs from > the filtered one, we schedule rebalance. > # In our example, calculated assignments will be [A, B, C], we will filter > them to [A,B] and schedule new rebalance to enhance stablePartAssignmentsKey > h3. Definition of done > * Corresponding approach must be implemented, so nodes that returned back > after majority loss could be returned back to stable -- This message was sent by Atlassian Jira (v8.20.10#820010)