yuzegao opened a new issue, #3089:
URL: https://github.com/apache/kvrocks/issues/3089

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/kvrocks/issues) and found no similar issues.
   
   
   ### Motivation
   
   In the following situations, a master-slave switchover in a Kvrocks cluster 
may cause a split-brain situation:
   1. The Kvrocks process hangs and then recovers
   2. The network card on the Kvrocks host fails and then recovers
   3. The network partition on the Kvrocks host recovers
   When the Kvrocks process regains access, client requests may be directed to 
the old master node for a brief period, causing a split-brain situation and 
data inconsistency.
   If clients access Kvrocks using a load balancer (LB), when a Kvrocks cluster 
switchover occurs, redirecting the LB address to the new master node ensures 
that client requests are not forwarded to the old master node.
   The Kvrocks cluster deployment architecture is as follows:
   
   <img width="1011" height="598" alt="Image" 
src="https://github.com/user-attachments/assets/fe3ed2a7-e747-452c-a983-0b075df22e86";
 />
   Benefits of this architecture:
   1. It covers most split-brain scenarios, including process restarts, process 
hangs, host network interface downtime, and network partitions.
   2. Using a load balancer (LB) accelerates failover, significantly faster 
than approaches such as using dynamic domain names on the client side.
   This approach is used in both AWS Elasticache and Azure Redis.
   
   ### Solution
   
   To ensure that client access to the KvRocks cluster goes through the load 
balancer (LB), while maintaining local IP for internal KvRocks cluster 
communication (master-slave synchronization and slot migration), the following 
modifications are required:
   1. Add two optional parameters to the clusterx setnodes command to specify a 
CLB host for each node.
   The following example shows that you can specify some or all LBs:
   CLUSTERX SETNODES "ZrFserb4Mqi5dbyCLCMUm9zFXMNPhzKE4RbFauJa 10.190.28.10 
7115 master - 0-5460 \n qhbVLW9dc2VF9CZOmvBJOzzqkhNA3oHoD3NVWVyg 10.190.28.10 
7133 master - 5461-10921 \n 9cnjXnaNK5KfPtX01V0wn7ZtM7FDPnz7va9qPJBs 
10.190.28.10 7237 master - 10922-16383 **192.168.0.2 1222**\n 
fuCHh8Ru9tFSY313QPfBcPQ3QeP63Z4rWpqpyaOl 10.190.28.10 7255 slave 
ZrFserb4Mqi5dbyCLCMUm9zFXMNPhzKE4RbFauJa \n 
cIqdj5C37cMrr7r7ydZDZ8zm1T1vPLMWj8516rrF 10.190.28.10 6741 slave 
qhbVLW9dc2VF9CZOmvBJOzzqkhNA3oHoD3NVWVyg \n 
3D8JRN8YVONqRU57jP99E3JdH2QnLHzZpFN4rcQQ 10.190.28.10 6765 slave 
9cnjXnaNK5KfPtX01V0wn7ZtM7FDPnz7va9qPJBs **192.168.0.1 1234**" 1754120940
   
   2. When persisting nodes.conf to a local file, add the LB address 
information to the file. When the kvrovks process starts, it will also load the 
corresponding LB address information.
   3. When executing the cluster nodes/slots/replicas commands, if the nodes 
have a LB host configured, replace it with LB host.
   
   ### Are you willing to submit a PR?
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to