[jira] [Updated] (IGNITE-24513) HA: stable is not expected after recovered availability and node restarts

Mirza Aliev (Jira) Fri, 07 Mar 2025 03:54:16 -0800


     [ 
https://issues.apache.org/jira/browse/IGNITE-24513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mirza Aliev updated IGNITE-24513:
---------------------------------
    Epic Link: IGNITE-23438

> HA: stable is not expected after recovered availability and node restarts 
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-24513
>                 URL: https://issues.apache.org/jira/browse/IGNITE-24513
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Mirza Aliev
>            Priority: Major
>              Labels: ignite-3
>
> See 
> {{ItHighAvailablePartitionsRecoveryByFilterUpdateTest#testSeveralHaResetsAndSomeNodeRestart}}
>  - the test that covers this scenario.
> *Precondition*
>      *   Create a zone in HA mode (7 nodes, A, B, C, D, E, F, G) - phase 1
>      *   Insert data and wait for replication to all nodes.
>      *   Stop a majority of nodes (4 nodes A, B, C, D)
>      *   Wait for the partition to become available (E, F, G), no new writes 
> - phase 2
>      *   Stop a majority of nodes once again (E, F)
>      *   Wait for the partition to become available (G), no new writes - 
> phase 3
>      *   Stop the last node G
>      *   Start one node from phase 1, A
>      *   Start one node from phase 3, G
>      *   Start one node from phase 2, E
>      *   No data should be lost (reads from partition on A and E must be 
> consistent with G)
> *Result*
> Before last step we check that stable is A, G, E, but it times out with 
> stable equals to G
>  
> *Expected result*
>  Stable is A, G, E after restart A, G, E
> h3. Implementation notes
> First of all, for debug purposes, I would simplify test to restart only A and 
> G, and assert that stable is (A, G)
> The second thought is to check if scale Up after A and G are restarted is 
> scheduled. And also check that there is no redundant partition reset actions, 
> I bet we have reset after nodes are restarted, because we check majority 
> using replica factor but not the actual stable size 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-24513) HA: stable is not expected after recovered availability and node restarts

Reply via email to