GitHub user onurkaraman opened a pull request:

    https://github.com/apache/kafka/pull/1484

    KAFKA-3810: replication of internal topics should not be limited by 
replica.fetch.max.bytes

    From the kafka-dev mailing list discussion: [DISCUSS] scalability limits in 
the coordinator
    
    There's a scalability limit on the new consumer / coordinator regarding the 
amount of group metadata we can fit into one message. This restricts a 
combination of consumer group size, topic subscription sizes, topic assignment 
sizes, and any remaining member metadata.
    
    Under more strenuous use cases like mirroring clusters with thousands of 
topics, this limitation can be reached even after applying gzip to the 
__consumer_offsets topic.
    
    Various options were proposed in the discussion:
    1. Config change: reduce the number of consumers in the group. This isn't 
always a realistic answer in more strenuous use cases like MirrorMaker clusters 
or for auditing.
    2. Config change: split the group into smaller groups which together will 
get full coverage of the topics. This gives each group member a smaller 
subscription.(ex: g1 has topics starting with a-m while g2 has topics starting 
with n-z). This would be operationally painful to manage.
    3. Config change: split the topics among members of the group. Again this 
gives each group member a smaller subscription. This would also be 
operationally painful to manage.
    4. Config change: bump up KafkaConfig.messageMaxBytes (a topic-level 
config) and KafkaConfig.replicaFetchMaxBytes (a broker-level config). Applying 
messageMaxBytes to just the __consumer_offsets topic seems relatively harmless, 
but bumping up the broker-level replicaFetchMaxBytes would probably need more 
attention.
    5. Config change: try different compression codecs. Based on 2 minutes of 
googling, it seems like lz4 and snappy are faster than gzip but have worse 
compression, so this probably won't help.
    6. Implementation change: support sending the regex over the wire instead 
of the fully expanded topic subscriptions. I think people said in the past that 
different languages have subtle differences in regex, so this doesn't play 
nicely with cross-language groups.
    7. Implementation change: maybe we can reverse the mapping? Instead of 
mapping from member to subscriptions, we can map a subscription to a list of 
members.
    8. Implementation change: maybe we can try to break apart the subscription 
and assignments from the same SyncGroupRequest into multiple records? They can 
still go to the same message set and get appended together. This way the limit 
become the segment size, which shouldn't be a problem. This can be tricky to 
get right because we're currently keying these messages on the group, so I 
think records from the same rebalance might accidentally compact one another, 
but my understanding of compaction isn't that great.
    9. Implementation change: try to apply some tricks on the assignment 
serialization to make it smaller.
    10. Config and Implementation change: bump up the __consumer_offsets topic 
messageMaxBytes and (from Jun Rao) fix how we deal with the case when a message 
is larger than the fetch size. Today, if the fetch size is smaller than the 
fetch size, the consumer will get stuck. Instead, we can simply return the full 
message if it's larger than the fetch size w/o requiring the consumer to 
manually adjust the fetch size.
    11. Config and Implementation change: same as above but only apply the 
special fetch logic when fetching from internal topics
    
    This PR provides an implementation of option 11.
    
    That being said, I'm not very happy with this approach as it essentially 
doesn't honor the "replica.fetch.max.bytes" config. Better alternatives are 
definitely welcome!

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/onurkaraman/kafka KAFKA-3810

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/1484.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1484
    
----
commit f8c1bb2d07df13b90ec338ddc4de08d58aab153d
Author: Onur Karaman <okara...@linkedin.com>
Date:   2016-06-09T09:43:25Z

    replication of internal topics should not be limited by 
replica.fetch.max.bytes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to