Zhanxiang (Patrick) Huang created KAFKA-7537:
------------------------------------------------
Summary: Only include live brokers in the UpdateMetadataRequest
sent to existing brokers if there is no change in the partition states
Key: KAFKA-7537
URL: https://issues.apache.org/jira/browse/KAFKA-7537
Project: Kafka
Issue Type: Improvement
Components: controller
Reporter: Zhanxiang (Patrick) Huang
Assignee: Zhanxiang (Patrick) Huang
Currently if when brokers join/leave the cluster without any partition states
changes, controller will send out UpdateMetadataRequests containing the states
of all partitions to all brokers. But for existing brokers in the cluster, the
metadata diff between controller and the broker should only be the
"live_brokers" info. Only the brokers with empty metadata cache need the full
UpdateMetadataRequest. Sending the full UpdateMetadataRequest to all brokers
can place nonnegligible memory pressure on the controller side.
Let's say in total we have N brokers, M partitions in the cluster and we want
to add 1 brand new broker in the cluster. With RF=2, the memory footprint per
partition in the UpdateMetadataRequest is ~200 Bytes. In the current controller
implementation, if each of the N RequestSendThreads serializes and sends out
the UpdateMetadataRequest at roughly the same time (which is very likely the
case), we will end up using *(N+1)*M*200B*. In a large kafka cluster, we can
have:
{noformat}
N=99
M=100k
Memory usage to send out UpdateMetadataRequest to all brokers:
100 * 100K * 200B = 2G
However, we only need to send out full UpdateMetadataRequest to the newly added
broker. We only need to include live broker ids (4B * 100 brokers) in the
UpdateMetadataRequest sent to the existing 99 brokers. So the amount of data
that is actully needed will be:
1 * 100K * 200B + 99 * (100 * 4B) = ~21M
We will can potentially reduce 2G / 21M = ~95x memory footprint as well as the
data tranferred in the network.{noformat}
This issue kind of hurts the scalability of a kafka cluster. KIP-380 and
KAFKA-7186 also help to further reduce the controller memory footprint.
In terms of implementation, we can keep some in-memory state in the controller
side to differentiate existing brokers and uninitialized brokers (e.g. brand
new brokers) so that if there is no change in partition states, we only send
out live brokers info to existing brokers.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)