Re: [DISCUSS] KIP-405: Kafka Tiered Storage

Ying Zheng Thu, 07 Nov 2019 13:14:01 -0800

On Wed, Nov 6, 2019 at 6:28 PM Tom Bentley <tbent...@redhat.com> wrote:


> Hi Ying,
>
> Because only inactive segments can be shipped to remote storage, to be able
> > to ship log data as soon
> > as possible, we will roll log segment very fast (e.g. every half hour).
> >
>
> So that means a consumer which gets behind by half an hour will find its
> reads being served from remote storage.


No, the segments are shipped to remote storage as soon as possible. But
the local segment is not deleted until a configurable time (e.g. 6 hours).
The consumer request is served from local storage as long as the local
copy is still available. After 6 hour or longer, the consumer request will
be
served by remote storage.


> And, if I understand the proposed
> algorithm, each such consumer fetch request could result in a separate
> fetch request from the remote storage. I.e. there's no mechanism to
> amortize the cost of the fetching between multiple consumers fetching
> similar ranges?
>
>
We can have a small in memory cache on the broker. But this is not a high
priority right now. In any normal case, a Kafka consumer should not lag for
more than several hours. Only in some very extreme cases, a Kafka
consumer have to read from remote storage. It's very rare that 2 or more
consumers are read the same piece of remote data at about the same time.

(Actually the doc for RemoteStorageManager.read() says "It will read at
> least one batch, if the 1st batch size is larger than maxBytes.". Does that
> mean the broker might have to retry with increased maxBytes if the first
> request fails to read a batch? If so, how does it know how much to increase
> maxBytes by?)
>
>
No, there is no retry, just continuously reading until a full batch is
received.
The logic is exactly the same with the existing local segment read.

Re: [DISCUSS] KIP-405: Kafka Tiered Storage

Reply via email to