Opened https://github.com/apache/kafka/pull/19336

On 2025/03/27 06:32:33 Luke Chen wrote:
> Hi Stanislav,
> 
> Thanks for raising this! I totally forgot about it!
> 
> For the documentation improvement, I think more is better.
> All you listed above can be done together.
> Also "remote.fetch.max.wait.ms" config is also a good place to add this
> missing info.
> 
> Thanks.
> Luke
> 
> On Thu, Mar 27, 2025 at 4:27 AM Stanislav Kozlovski <
> stanislavkozlov...@apache.org> wrote:
> 
> > Hey all,
> >
> > I was doing a deep dive on the internals of KIP-405's read path and I was
> > surprised to learn that the broker only fetches remote data for ONE
> > partition in a given FetchRequest. In other words, if a consumer sends a
> > FetchRequest requesting 50 topic-partitions, and each partition's requested
> > offset is not stored locally - the broker will fetch and respond with just
> > one partition's worth of data from the remote store, and the rest will be
> > empty.
> >
> > I found this very unintuitive (shocking, really), given our defaults for
> > total fetch response is 50 MiB and per partition is 1 MiB. In essence, this
> > means that a fetch request may be 50x smaller than it ought to be and be
> > the bottleneck for throughput when performing remote (historical) reads.
> >
> > I synced very briefly with Satish offline and realized there is a JIRA
> > tracking this (KAFKA-14915
> > <https://issues.apache.org/jira/browse/KAFKA-14915> I believe), but I
> > figured it's better to raise the discussion with the community than
> > continue async.
> >
> > I see a few negatives with this behavior. In order of priority:
> > 1. it is unintuitive and not documented
> > 2. it is a potential performance bottleneck
> > 3. it somewhat obsoletes great features like read caching and prefetching
> > that have been implemented in popular KIP-405 plugins (the Aiven one
> > supporting all 3 clouds in particular). The goal of these features, as I
> > understand them, is to increase throughput and reduce latency, but the
> > plugin may very well NOT be given a chance to serve data from cache since
> > it'll be called for only one partition per request.
> >
> > I acknowledge the proper implementation isn't straightforward, so
> > I understand why a version with this behavior was shipped. I am not sure if
> > I would have marked the feature GA though.
> >
> > In any case, I particularly want to begin this discussion by focusing on 1)
> > - the lack of documentation. (the easiest to fix)
> >
> > I didn't find this information in KIP-405
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage
> > >
> > nor in the documentation of the fetch.max.bytes
> > <https://kafka.apache.org/documentation/#consumerconfigs_fetch.max.bytes>
> > config.
> > I couldn't find it through googling. I even asked all popular commercial
> > LLMs.
> >
> > How should we best document this behavior? My default was to add it to the
> > fetch.max.bytes config.
> >
> > A short note on KIP-405 would be useful too, but that document is too
> > verbose for instructing users in my opinion. We had Tiered Storage Early
> > Access Release Notes
> > <http://splay/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes> (it
> > wasn't mentioned there either)... maybe we could create a similar one
> > marking current limitations and link it (as one of the first things) from
> > the KIP?
> >
> 

Reply via email to