[ https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372870#comment-15372870 ]
Ralph Weires commented on KAFKA-1464: ------------------------------------- Another related idea then, since those consumer rebalancing issues that result during maintenance for us drove me up the walls yesterday... Just desperately looking for a way to get this stabilized (on our v0.8.2.1) ;) Wouldn't a (manual and temporary) modification of the partition assignment also be a viable option, to prevent a given node from becoming leader for any partitions? I mean, could I issue kafka-reassign-partitions.sh with a customized partition assignment, that wouldn't actually re-assign any partitions to different brokers, but would merely change the replica *order* for several of the partitions - such that the node in question no longer is first replica for any partition? If I understand it right, the controller will always prefer the first replica as leader in balancing, so I'd just need to make sure that my node won't be the first replica for anything. All this temporarily of course, so after the maintenance I'd restore the original partition assignment back again. Should this work, or would you expect specific problems with this workaround...? Also: Let me know if this rather belongs onto the mailing list, since admittedly it isn't really related to throttling... But as a side-remark in this regard, I also tried throttling outside kafka (i.e. on side of the network, tried via wondershaper) in our problem case, but that didn't help. I'd agree this would need to be within kafka, i.e. to be able to separate out-of-sync replica recovery traffic from the rest. > Add a throttling option to the Kafka replication tool > ----------------------------------------------------- > > Key: KAFKA-1464 > URL: https://issues.apache.org/jira/browse/KAFKA-1464 > Project: Kafka > Issue Type: New Feature > Components: replication > Affects Versions: 0.8.0 > Reporter: mjuarez > Assignee: Ben Stopford > Priority: Minor > Labels: replication, replication-tools > Fix For: 0.10.1.0 > > > When performing replication on new nodes of a Kafka cluster, the replication > process will use all available resources to replicate as fast as possible. > This causes performance issues (mostly disk IO and sometimes network > bandwidth) when doing this in a production environment, in which you're > trying to serve downstream applications, at the same time you're performing > maintenance on the Kafka cluster. > An option to throttle the replication to a specific rate (in either MB/s or > activities/second) would help production systems to better handle maintenance > tasks while still serving downstream applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)