yuzegao opened a new issue, #3089: URL: https://github.com/apache/kvrocks/issues/3089
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/kvrocks/issues) and found no similar issues. ### Motivation In the following situations, a master-slave switchover in a Kvrocks cluster may cause a split-brain situation: 1. The Kvrocks process hangs and then recovers 2. The network card on the Kvrocks host fails and then recovers 3. The network partition on the Kvrocks host recovers When the Kvrocks process regains access, client requests may be directed to the old master node for a brief period, causing a split-brain situation and data inconsistency. If clients access Kvrocks using a load balancer (LB), when a Kvrocks cluster switchover occurs, redirecting the LB address to the new master node ensures that client requests are not forwarded to the old master node. The Kvrocks cluster deployment architecture is as follows: <img width="1011" height="598" alt="Image" src="https://github.com/user-attachments/assets/fe3ed2a7-e747-452c-a983-0b075df22e86" /> Benefits of this architecture: 1. It covers most split-brain scenarios, including process restarts, process hangs, host network interface downtime, and network partitions. 2. Using a load balancer (LB) accelerates failover, significantly faster than approaches such as using dynamic domain names on the client side. This approach is used in both AWS Elasticache and Azure Redis. ### Solution To ensure that client access to the KvRocks cluster goes through the load balancer (LB), while maintaining local IP for internal KvRocks cluster communication (master-slave synchronization and slot migration), the following modifications are required: 1. Add two optional parameters to the clusterx setnodes command to specify a CLB host for each node. The following example shows that you can specify some or all LBs: CLUSTERX SETNODES "ZrFserb4Mqi5dbyCLCMUm9zFXMNPhzKE4RbFauJa 10.190.28.10 7115 master - 0-5460 \n qhbVLW9dc2VF9CZOmvBJOzzqkhNA3oHoD3NVWVyg 10.190.28.10 7133 master - 5461-10921 \n 9cnjXnaNK5KfPtX01V0wn7ZtM7FDPnz7va9qPJBs 10.190.28.10 7237 master - 10922-16383 **192.168.0.2 1222**\n fuCHh8Ru9tFSY313QPfBcPQ3QeP63Z4rWpqpyaOl 10.190.28.10 7255 slave ZrFserb4Mqi5dbyCLCMUm9zFXMNPhzKE4RbFauJa \n cIqdj5C37cMrr7r7ydZDZ8zm1T1vPLMWj8516rrF 10.190.28.10 6741 slave qhbVLW9dc2VF9CZOmvBJOzzqkhNA3oHoD3NVWVyg \n 3D8JRN8YVONqRU57jP99E3JdH2QnLHzZpFN4rcQQ 10.190.28.10 6765 slave 9cnjXnaNK5KfPtX01V0wn7ZtM7FDPnz7va9qPJBs **192.168.0.1 1234**" 1754120940 2. When persisting nodes.conf to a local file, add the LB address information to the file. When the kvrovks process starts, it will also load the corresponding LB address information. 3. When executing the cluster nodes/slots/replicas commands, if the nodes have a LB host configured, replace it with LB host. ### Are you willing to submit a PR? - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
