Allen315 opened a new issue, #329: URL: https://github.com/apache/kvrocks-controller/issues/329
Hi guys, while analyzing the use of a controller to manage KVrocks clusters in a multi-AZ deployment, we identified a risk of split-brain scenarios. For example, in the diagram below, when a network partition occurs: * The connection between the Load Balancer (LB) and Node N1 remains functional * But the connection between the Controller Master and Node N1 fails due to network issues This causes the Controller Master to mistakenly mark N1 as faulty and trigger failover, promoting N1's slave to master. However, N1 is actually healthy and continues to accept writes from the LB. Result : Two active masters (split-brain) exist for the same shard, leading to data inconsistency. <img width="982" height="648" alt="Image" src="https://github.com/user-attachments/assets/091a5063-878b-4ea3-890f-c61d1d4e008e" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
