[ https://issues.apache.org/jira/browse/KAFKA-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16777257#comment-16777257 ]
GEORGE LI commented on KAFKA-6794: ---------------------------------- Hi [~viktorsomogyi], I think this "Incremental Reassignment" is different from KIP-236 "Planned Future Change" section. That one is basically trying to overcome the current limitation that only one batch of reassignments can be run in /admin/reassign_partitions. e.g. 50 reassignments in a batch submitted, 49 completed. and there is one long running reassignment pending in /admin/reassign_partitions, Currently, not able to submit new batch until all in /admin/reassign_partitions are completed and the node is removed from ZK. If the cluster is pretty much idle, this pretty much waste the resource for not able to submit new reassignments. The proposal is to enable submit new batch to a queue (ZK node), and merge the new assignments to /admin/reassign_partitions. This will try to use the Cancel Reassignments if there is conflict (same topic/partition) in both the new queue and the current /admin/reassign_partitions . > Support for incremental replica reassignment > -------------------------------------------- > > Key: KAFKA-6794 > URL: https://issues.apache.org/jira/browse/KAFKA-6794 > Project: Kafka > Issue Type: Improvement > Reporter: Jason Gustafson > Assignee: Viktor Somogyi-Vass > Priority: Major > > Say you have a replication factor of 4 and you trigger a reassignment which > moves all replicas to new brokers. Now 8 replicas are fetching at the same > time which means you need to account for 8 times the current producer load > plus the catch-up replication. To make matters worse, the replicas won't all > become in-sync at the same time; in the worst case, you could have 7 replicas > in-sync while one is still catching up. Currently, the old replicas won't be > disabled until all new replicas are in-sync. This makes configuring the > throttle tricky since ISR traffic is not subject to it. > Rather than trying to bring all 4 new replicas online at the same time, a > friendlier approach would be to do it incrementally: bring one replica > online, bring it in-sync, then remove one of the old replicas. Repeat until > all replicas have been changed. This would reduce the impact of a > reassignment and make configuring the throttle easier at the cost of a slower > overall reassignment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)