>So that means a consumer which gets behind by half an hour will find its reads being served from remote storage. And, if I understand the proposed algorithm, each such consumer fetch request could result in a separate fetch request from the remote storage. I.e. there's no mechanism to amortize the cost of the fetching between multiple consumers fetching similar ranges?
local log segments are deleted according to the local log.retention.time/.size settings though they may have been already copied to remote storage. Consumers would still be able to fetch the messages from local storage if they are not yet deleted based on the retention. They will be served from remote storage only when they are not locally available. Thanks, Satish. On Thu, Nov 7, 2019 at 7:58 AM Tom Bentley <tbent...@redhat.com> wrote: > > Hi Ying, > > Because only inactive segments can be shipped to remote storage, to be able > > to ship log data as soon > > as possible, we will roll log segment very fast (e.g. every half hour). > > > > So that means a consumer which gets behind by half an hour will find its > reads being served from remote storage. And, if I understand the proposed > algorithm, each such consumer fetch request could result in a separate > fetch request from the remote storage. I.e. there's no mechanism to > amortize the cost of the fetching between multiple consumers fetching > similar ranges? > > (Actually the doc for RemoteStorageManager.read() says "It will read at > least one batch, if the 1st batch size is larger than maxBytes.". Does that > mean the broker might have to retry with increased maxBytes if the first > request fails to read a batch? If so, how does it know how much to increase > maxBytes by?) > > Thanks, > > Tom