[ 
https://issues.apache.org/jira/browse/KAFKA-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750875#comment-16750875
 ] 

Tom Bentley commented on KAFKA-7854:
------------------------------------

Just for context, my original thoughts about the API for partition reassignment 
were in KIP-179, but I view it as being a bit too basic for the sorts of things 
I'd like to be able to do. It was enough to replace the direct zookeeper access 
required by {{kafka-reassign-partitions.sh}}, but it didn't tackle some of the 
other annoying things about partition reassignment, like the fact that you 
can't start new reassignments when some are already in progress. For that 
reason I thought it better to work on a more ambitious set of APIs, and KIPs 
236 and 240 were the result (and why I withdrew 179). What I didn't publish 
were my ideas for the actual reassignment API. I still have those details 
buried away somewhere, but I have lacked the time to work on any of this stuff 
properly over the last year. I guess I could try to write up my thoughts about 
that API if people would be interested.

> Behavior change in controller picking up partition reassignment tasks since 
> 1.1.0
> ---------------------------------------------------------------------------------
>
>                 Key: KAFKA-7854
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7854
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>            Reporter: Zhanxiang (Patrick) Huang
>            Priority: Major
>
> After [https://github.com/apache/kafka/pull/4143,] the controller does not 
> subscribe to data change on /admin/reassign_partitions any more (in order to 
> avoid unnecessarily loading the reassignment data again after controller 
> updating the znode) as opposed to the previous kafka versions. However, there 
> are systems built around kafka relying on the previous behavior to 
> incrementally update the list of partition reassignment since kafka does not 
> natively support that.
>  
> For example, [cruise control|https://github.com/linkedin/cruise-control] can 
> rely on the previous behavior (controller listening to data changes) to 
> maintain the reassignment concurrency by dynamically updating the data in the 
> reassignment znode instead of waiting for the current batch to finish and 
> doing reassignment batch by batch, which can significantly reduce the 
> rebalance time in production clusters. Although directly updating the znode 
> can somehow be viewed as an anti-pattern in the long term, this is necessary 
> since kafka does not natively support incrementally submit more reassignment 
> tasks. However, after our kafka clusters migrate from 0.11 to 2.0, cruise 
> control no longer works because the controller behavior has changed. This 
> reveals the following problems:
>  * These behavior changes may be viewed as internal changes so compatibility 
> is not guaranteed but I think by convention people do view this as public 
> interfaces and rely on the compatibility. In this case, I think we should 
> clearly document the data contract for the partition reassignment task to 
> avoid misusage and making controller changes that break the defined data 
> contract. There may be other cases (e.g. topic deletion) whose data contracts 
> need to be clearly defined and we should keep it in mind when making 
> controller changes.
>  * Kafka does not natively support incrementally submit more reassignment 
> tasks. If we do want to support that nicely, we should consider change how we 
> store the reassignment data to store the data in child nodes and let the 
> controller listen on child node changes, similar to what we do for 
> /admin/delete_topics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to