Hi Kamal,
Thanks for your response.

I know a reasonable amount about the Apache Kafka consumer and protocol. In 
general, the limits
are applied as soft limits. Essentially, read some data, check whether the 
limit has been reached,
and repeat until the limit has been reached. It is very common to go slightly 
over the limit as a
result, typically by up to one record batch. As a result, the consumer doesn't 
pay any
attention to the amount of data returned to it. It does pay attention to the 
number of records
returned in the way that it batches them to deliver to the application, but 
that's it.

By specifying max.fetch.bytes of 50MB, the consumer might receive 50MB of data
in response to the fetch. So, I think there is essentially "permission" from 
the consumer
to return up to 50MB. If efficiency demands that you fetch at least 4MB from 
remote
storage, keeping well within max.fetch.bytes seems fine to me. I can imagine 
that
a broker configuration would be helpful in case the characteristics of a 
particular
storage provider needed to be accommodated.

I wonder if Emanuele Sabellico can add any information about how librdkafka
manages the limits in FetchRequest and how sensitive it is to more data being
returned for a partition than expected.

Thanks,
Andrew

________________________________________
From: Kamal Chandraprakash <kamal.chandraprak...@gmail.com>
Sent: 08 May 2025 19:24
To: dev@kafka.apache.org <dev@kafka.apache.org>
Subject: Re: [DISCUSS] KIP-1178: Introduce remote.max.partition.fetch.bytes in 
Consumer

Hi Andrew,

Thanks for the review!

The initial idea was to introduce the configuration on the broker side,
similar to how remote.fetch.max.wait.ms complements fetch.max.wait.ms:

   - fetch.max.wait.ms — configured via ConsumerConfig
   
<https://sourcegraph.com/github.com/apache/kafka/-/blob/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java?L205>
   - remote.fetch.max.wait.ms — configured via RemoteLogManagerConfig
   
<https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManagerConfig.java?L185>
    (broker-side)

With KIP-74
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+Limit+in+Bytes>,
the max.partition.fetch.bytes
<https://sourcegraph.com/github.com/apache/kafka/-/blob/clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java?L218-224>
is
treated as a soft limit. This means the actual size of the FETCH response
returned to the client may exceed this value, particularly when a single
RecordBatch is larger than the configured limit.

One area that remains unclear is how third-party clients behave when they
are configured with the default values:

   - max.partition.fetch.bytes = 1 MB and
   - max.fetch.bytes = 50 MB

If the broker responds with a 4 MB partition response containing multiple
RecordBatches, does the client fail to process the records
due to exceeding max.partition.fetch.bytes, or does it handle the larger
response gracefully?

Thanks,
Kamal

On Thu, May 8, 2025 at 7:19 PM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

> Hi Kamal,
> Thanks for the KIP.
>
> While it makes a lot of sense to me to be able to control the fetching
> from remote
> storage to make sure it's sympathetic to the characteristics of the
> storage provider,
> it seems to me that extending this concept all the way to the individual
> consumers
> is not a good idea. You might have different consumers specifying their own
> wildly different values, when really you want a consistent configuration
> which
> applies whenever data is fetched from remote storage. Could a broker or
> topic
> config be used to achieve this more effectively? It worries me whenever we
> have
> a configuration which would ideally be used by all consumers setting the
> same
> value. It suggests that they shouldn't be able to provide their own values
> at all.
>
> Thanks,
> Andrew
> ________________________________________
> From: Kamal Chandraprakash <kamal.chandraprak...@gmail.com>
> Sent: 08 May 2025 12:07
> To: dev@kafka.apache.org <dev@kafka.apache.org>
> Subject: [DISCUSS] KIP-1178: Introduce remote.max.partition.fetch.bytes in
> Consumer
>
> Hi all,
>
> I've opened the KIP-1178 to add a new config
> 'remote.max.partition.fetch.bytes' in the consumer. This config allows it
> to read from remote storage faster.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1178%3A+Introduce+remote.max.partition.fetch.bytes+config+in+Consumer
>
> Please take a look and suggest your thoughts.
>
> Thanks,
> Kamal
>

Reply via email to