[ 
https://issues.apache.org/jira/browse/IGNITE-23572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Lapin updated IGNITE-23572:
-------------------------------------
    Ignite Flags:   (was: Docs Required,Release Notes Required)

> Change rebalance scheduling when data nodes are changed
> -------------------------------------------------------
>
>                 Key: IGNITE-23572
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23572
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Mirza Aliev
>            Assignee: Mirza Aliev
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> h3. Motivation
> [IEP-131|https://cwiki.apache.org/confluence/display/IGNITE/IEP-131%3A+Partition+Majority+Unavailability+Handling]
>  
>  
> In HA mode for scale up situations:
>  # Let’s say we have [A, B, C] for the partition assignments, B and C left.
>  # Raft group was narrowed in force manner to [A], and after that node B 
> returned, we must enhance stable to [A, B]
>  # In terms of DZ.scale up, there wasn't any change in DZ.scale up time 
> window, so data nodes will be the same, so it could mean that we don’t need 
> to schedule new rebalance to enhance stable assignment to [A, B], but we 
> actually do need. (Note that DZ.scale down timer is quite big and wasn’t 
> event passed)
> Proposed enhancements
>  # Data nodes are rewritten on scale up even if they are the same
>  # When we decide if we need to trigger rebalance after data nodes change, we 
> calculate assignments and apply nodes aliveness check filter to those 
> assignments. If we see that the actual stablePartAssignmentsKey differs from 
> the filtered one, we schedule rebalance.
>  # In our example, calculated assignments will be [A, B, C], we will filter 
> them to [A,B] and schedule new rebalance to enhance stablePartAssignmentsKey
> h3. Definition of done
> * Corresponding approach must be implemented, so nodes that returned back 
> after majority loss could be returned back to stable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to