[
https://issues.apache.org/jira/browse/KAFKA-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jun Rao resolved KAFKA-7537.
----------------------------
Resolution: Fixed
Fix Version/s: 2.2.0
Merged the PR to trunk.
> Only include live brokers in the UpdateMetadataRequest sent to existing
> brokers if there is no change in the partition states
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-7537
> URL: https://issues.apache.org/jira/browse/KAFKA-7537
> Project: Kafka
> Issue Type: Improvement
> Components: controller
> Reporter: Zhanxiang (Patrick) Huang
> Assignee: Zhanxiang (Patrick) Huang
> Priority: Major
> Fix For: 2.2.0
>
>
> Currently if when brokers join/leave the cluster without any partition states
> changes, controller will send out UpdateMetadataRequests containing the
> states of all partitions to all brokers. But for existing brokers in the
> cluster, the metadata diff between controller and the broker should only be
> the "live_brokers" info. Only the brokers with empty metadata cache need the
> full UpdateMetadataRequest. Sending the full UpdateMetadataRequest to all
> brokers can place nonnegligible memory pressure on the controller side.
> Let's say in total we have N brokers, M partitions in the cluster and we want
> to add 1 brand new broker in the cluster. With RF=2, the memory footprint per
> partition in the UpdateMetadataRequest is ~200 Bytes. In the current
> controller implementation, if each of the N RequestSendThreads serializes and
> sends out the UpdateMetadataRequest at roughly the same time (which is very
> likely the case), we will end up using *(N+1)*M*200B*. In a large kafka
> cluster, we can have:
> {noformat}
> N=99
> M=100k
> Memory usage to send out UpdateMetadataRequest to all brokers:
> 100 * 100K * 200B = 2G
> However, we only need to send out full UpdateMetadataRequest to the newly
> added broker. We only need to include live broker ids (4B * 100 brokers) in
> the UpdateMetadataRequest sent to the existing 99 brokers. So the amount of
> data that is actully needed will be:
> 1 * 100K * 200B + 99 * (100 * 4B) = ~21M
> We will can potentially reduce 2G / 21M = ~95x memory footprint as well as
> the data tranferred in the network.{noformat}
>
> This issue kind of hurts the scalability of a kafka cluster. KIP-380 and
> KAFKA-7186 also help to further reduce the controller memory footprint.
>
> In terms of implementation, we can keep some in-memory state in the
> controller side to differentiate existing brokers and uninitialized brokers
> (e.g. brand new brokers) so that if there is no change in partition states,
> we only send out live brokers info to existing brokers.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)