RiversJin opened a new issue, #268:
URL: https://github.com/apache/kvrocks-controller/issues/268

   In the `tryUpdateMigrationStatus` logic, after a successful migration, the 
controller first updates the topology in the kvrocks cluster via `SetSlot`, 
then updates the etcd data using `clusterStore.UpdateCluster`. However, there 
appears to be a potential vulnerability here:  
   - If `SetSlot` succeeds but a network failure prevents `UpdateCluster` from 
completing (or worse, if `SetSlot` partially succeeds with some nodes updated 
and others failing), this could leave the kvrocks cluster version **higher** 
than the cluster version stored in etcd.  
   - Even if another controller takes over from the failed one, it would be 
unable to execute new operations because subsequent "versions" issued by the 
controller would no longer exceed the version already in kvrocks.  
   
   To address this, I suggest reversing the order: **update etcd first**, then 
propagate the new topology to kvrocks.  
   As a side note, if this approach is adopted, the `clusterx set slot` command 
in kvrocks might become redundant, as we could fully update the post-migration 
topology using `setnodes` instead.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to