[ https://issues.apache.org/jira/browse/KAFKA-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749043#comment-16749043 ]
Adem Efe Gencer commented on KAFKA-7854: ---------------------------------------- *Relevant Cruise Control Issue*: [Issue-496|https://github.com/linkedin/cruise-control/issues/496]. *Key issue*: Kafka lacks the proper public APIs for [Cruise Control|https://github.com/linkedin/cruise-control] (CC) to manage a cluster. In particular, Kafka does not support dynamically adding replica reassignments while there are ongoing reassignments. Hence, it is not possible to maintain a desired level of replica movement concurrency unless CC uses the ZK API. The conundrum here is that ZK APIs are considered internal, and AdminClient is recommended whenever possible (See [~ijuma]'s comment on [a similar use-case|https://github.com/linkedin/cruise-control/issues/285#issuecomment-410455705]). However, as of today, these public APIs don't exist, which makes the silent behavior changes as in [PR-4143|https://github.com/apache/kafka/pull/4143] breaking for CC, which affects users adversely. *Recommended solution*: Ideally, Kafka should provide AdminClient APIs to: # Allow dynamically appending an additional set of replica reassignments regardless of whether there are ongoing replica reassignments, and # Allow clean cancellation of ongoing replica movements (see [KAFKA-6304|https://issues.apache.org/jira/browse/KAFKA-6304]). > Behavior change in controller picking up partition reassignment tasks since > 1.1.0 > --------------------------------------------------------------------------------- > > Key: KAFKA-7854 > URL: https://issues.apache.org/jira/browse/KAFKA-7854 > Project: Kafka > Issue Type: Improvement > Components: controller > Reporter: Zhanxiang (Patrick) Huang > Priority: Major > > After [https://github.com/apache/kafka/pull/4143,] the controller does not > subscribe to data change on /admin/reassign_partitions any more (in order to > avoid unnecessarily loading the reassignment data again after controller > updating the znode) as opposed to the previous kafka versions. However, there > are systems built around kafka relying on the previous behavior to > incrementally update the list of partition reassignment since kafka does not > natively support that. > > For example, [cruise control|https://github.com/linkedin/cruise-control] can > rely on the previous behavior (controller listening to data changes) to > maintain the reassignment concurrency by dynamically updating the data in the > reassignment znode instead of waiting for the current batch to finish and > doing reassignment batch by batch, which can significantly reduce the > rebalance time in production clusters. Although directly updating the znode > can somehow be viewed as an anti-pattern in the long term, this is necessary > since kafka does not natively support incrementally submit more reassignment > tasks. However, after our kafka clusters migrate from 0.11 to 2.0, cruise > control no longer works because the controller behavior has changed. This > reveals the following problems: > * These behavior changes may be viewed as internal changes so compatibility > is not guaranteed but I think by convention people do view this as public > interfaces and rely on the compatibility. In this case, I think we should > clearly document the data contract for the partition reassignment task to > avoid misusage and making controller changes that break the defined data > contract. There may be other cases (e.g. topic deletion) whose data contracts > need to be clearly defined and we should keep it in mind when making > controller changes. > * Kafka does not natively support incrementally submit more reassignment > tasks. If we do want to support that nicely, we should consider change how we > store the reassignment data to store the data in child nodes and let the > controller listen on child node changes, similar to what we do for > /admin/delete_topics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)