[jira] [Commented] (KAFKA-7149) Reduce assignment data size to improve kafka streams scalability

Guozhang Wang (JIRA) Fri, 14 Sep 2018 13:30:16 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615339#comment-16615339
 ]


Guozhang Wang commented on KAFKA-7149:
--------------------------------------

The {{Assignment}} object is sent one for each consumer, which includes the 
assigned partitions and the userData which is decoded into {{AssignmentInfo}}. 
Note that after the consumer -> assignment information is sent to the broker 
coordinator, it will send this assignment to each corresponding consumer.
So it is actually a one-to-one mapping from {{Assignment}} to 
{{AssignmentInfo}}, not one-to-many mapping.

As a result, each consumer will only get one decoded {{AssignmentInfo}}. Note 
that each consumer indeed need to know the global tasksByHost map in order to 
support interactive queries (users can ask any instance, where to ask for a 
specific key, for example). 


Does that make sense?

> Reduce assignment data size to improve kafka streams scalability
> ----------------------------------------------------------------
>
>                 Key: KAFKA-7149
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7149
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Ashish Surana
>            Assignee: Ashish Surana
>            Priority: Major
>
> We observed that when we have high number of partitions, instances or 
> stream-threads, assignment-data size grows too fast and we start getting 
> below RecordTooLargeException at kafka-broker.
> Workaround of this issue is commented at: 
> https://issues.apache.org/jira/browse/KAFKA-6976
> Still it limits the scalability of kafka streams as moving around 100MBs of 
> assignment data for each rebalancing affects performance & reliability 
> (timeout exceptions starts appearing) as well. Also this limits kafka streams 
> scale even with high max.message.bytes setting as data size increases pretty 
> quickly with number of partitions, instances or stream-threads.
>  
> Solution:
> To address this issue in our cluster, we are sending the compressed 
> assignment-data. We saw assignment-data size reduced by 8X-10X. This improved 
> the kafka streams scalability drastically for us and we could now run it with 
> more than 8,000 partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-7149) Reduce assignment data size to improve kafka streams scalability

Reply via email to