[ https://issues.apache.org/jira/browse/KAFKA-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404147#comment-15404147 ]
Jun Rao commented on KAFKA-2063: -------------------------------- For #3, the main reason for people to customize partition-level fetch size is to accommodate large messages. We can make the default max_reponse_bytes sth like 10MB, which is 10 times larger than default message size. If max_reponse_bytes >= partition-level fetch size, we just ignore the latter. Otherwise, we can error out and advise users to increase max_reponse_bytes if truly needed. > Bound fetch response size > ------------------------- > > Key: KAFKA-2063 > URL: https://issues.apache.org/jira/browse/KAFKA-2063 > Project: Kafka > Issue Type: Improvement > Reporter: Jay Kreps > > Currently the only bound on the fetch response size is > max.partition.fetch.bytes * num_partitions. There are two problems: > 1. First this bound is often large. You may chose > max.partition.fetch.bytes=1MB to enable messages of up to 1MB. However if you > also need to consume 1k partitions this means you may receive a 1GB response > in the worst case! > 2. The actual memory usage is unpredictable. Partition assignment changes, > and you only actually get the full fetch amount when you are behind and there > is a full chunk of data ready. This means an application that seems to work > fine will suddenly OOM when partitions shift or when the application falls > behind. > We need to decouple the fetch response size from the number of partitions. > The proposal for doing this would be to add a new field to the fetch request, > max_bytes which would control the maximum data bytes we would include in the > response. > The implementation on the server side would grab data from each partition in > the fetch request until it hit this limit, then send back just the data for > the partitions that fit in the response. The implementation would need to > start from a random position in the list of topics included in the fetch > request to ensure that in a case of backlog we fairly balance between > partitions (to avoid first giving just the first partition until that is > exhausted, then the next partition, etc). > This setting will make the max.partition.fetch.bytes field in the fetch > request much less useful and we should discuss just getting rid of it. > I believe this also solves the same thing we were trying to address in > KAFKA-598. The max_bytes setting now becomes the new limit that would need to > be compared to max_message size. This can be much larger--e.g. setting a 50MB > max_bytes setting would be okay, whereas now if you set 50MB you may need to > allocate 50MB*num_partitions. > This will require evolving the fetch request protocol version to add the new > field and we should do a KIP for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)