[ 
https://issues.apache.org/jira/browse/KAFKA-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777257#comment-16777257
 ] 

GEORGE LI commented on KAFKA-6794:
----------------------------------

Hi [~viktorsomogyi],  

I think this "Incremental Reassignment"  is different from KIP-236  "Planned 
Future Change" section.  That one is basically trying to overcome the current 
limitation that only one batch of reassignments can be run in 
/admin/reassign_partitions. 

e.g.  50 reassignments in a batch submitted,   49 completed.  and there is one 
long running reassignment pending in /admin/reassign_partitions,  Currently,  
not able to submit new batch until all in  /admin/reassign_partitions are 
completed and the node is removed from ZK.   If the cluster is pretty much 
idle,  this pretty much waste the resource for not able to submit new 
reassignments. 

The proposal is to enable submit new batch to a queue (ZK node),  and merge the 
new assignments to /admin/reassign_partitions.   This will try to use the  
Cancel Reassignments if there is conflict (same topic/partition) in both the 
new queue and the current /admin/reassign_partitions .    

 

> Support for incremental replica reassignment
> --------------------------------------------
>
>                 Key: KAFKA-6794
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6794
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jason Gustafson
>            Assignee: Viktor Somogyi-Vass
>            Priority: Major
>
> Say you have a replication factor of 4 and you trigger a reassignment which 
> moves all replicas to new brokers. Now 8 replicas are fetching at the same 
> time which means you need to account for 8 times the current producer load 
> plus the catch-up replication. To make matters worse, the replicas won't all 
> become in-sync at the same time; in the worst case, you could have 7 replicas 
> in-sync while one is still catching up. Currently, the old replicas won't be 
> disabled until all new replicas are in-sync. This makes configuring the 
> throttle tricky since ISR traffic is not subject to it.
> Rather than trying to bring all 4 new replicas online at the same time, a 
> friendlier approach would be to do it incrementally: bring one replica 
> online, bring it in-sync, then remove one of the old replicas. Repeat until 
> all replicas have been changed. This would reduce the impact of a 
> reassignment and make configuring the throttle easier at the cost of a slower 
> overall reassignment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to