[ https://issues.apache.org/jira/browse/IGNITE-24513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mirza Aliev updated IGNITE-24513: --------------------------------- Epic Link: IGNITE-23438 > HA: stable is not expected after recovered availability and node restarts > -------------------------------------------------------------------------- > > Key: IGNITE-24513 > URL: https://issues.apache.org/jira/browse/IGNITE-24513 > Project: Ignite > Issue Type: Bug > Reporter: Mirza Aliev > Priority: Major > Labels: ignite-3 > > See > {{ItHighAvailablePartitionsRecoveryByFilterUpdateTest#testSeveralHaResetsAndSomeNodeRestart}} > - the test that covers this scenario. > *Precondition* > * Create a zone in HA mode (7 nodes, A, B, C, D, E, F, G) - phase 1 > * Insert data and wait for replication to all nodes. > * Stop a majority of nodes (4 nodes A, B, C, D) > * Wait for the partition to become available (E, F, G), no new writes > - phase 2 > * Stop a majority of nodes once again (E, F) > * Wait for the partition to become available (G), no new writes - > phase 3 > * Stop the last node G > * Start one node from phase 1, A > * Start one node from phase 3, G > * Start one node from phase 2, E > * No data should be lost (reads from partition on A and E must be > consistent with G) > *Result* > Before last step we check that stable is A, G, E, but it times out with > stable equals to G > > *Expected result* > Stable is A, G, E after restart A, G, E > h3. Implementation notes > First of all, for debug purposes, I would simplify test to restart only A and > G, and assert that stable is (A, G) > The second thought is to check if scale Up after A and G are restarted is > scheduled. And also check that there is no redundant partition reset actions, > I bet we have reset after nodes are restarted, because we check majority > using replica factor but not the actual stable size > -- This message was sent by Atlassian Jira (v8.20.10#820010)