Hi, Divij, Thanks for the explanation.
1. Hmm, the default implementation of RLMM does local caching, right? Currently, we also cache all segment metadata in the brokers without KIP-405. Do you see a need to change that? 2,3,4: Yes, your explanation makes sense. However, currently, RemoteLogMetadataManager.listRemoteLogSegments() doesn't specify a particular order of the iterator. Do you intend to change that? Thanks, Jun On Tue, Nov 8, 2022 at 3:31 AM Divij Vaidya <divijvaidy...@gmail.com> wrote: > Hey Jun > > Thank you for your comments. > > *1. "RLMM implementor could ensure that listRemoteLogSegments() is fast"* > This would be ideal but pragmatically, it is difficult to ensure that > listRemoteLogSegments() is fast. This is because of the possibility of a > large number of segments (much larger than what Kafka currently handles > with local storage today) would make it infeasible to adopt strategies such > as local caching to improve the performance of listRemoteLogSegments. Apart > from caching (which won't work due to size limitations) I can't think of > other strategies which may eliminate the need for IO > operations proportional to the number of total segments. Please advise if > you have something in mind. > > 2. "*If the size exceeds the retention size, we need to determine the > subset of segments to delete to bring the size within the retention limit. > Do we need to call RemoteLogMetadataManager.listRemoteLogSegments() to > determine that?"* > Yes, we need to call listRemoteLogSegments() to determine which segments > should be deleted. But there is a difference with the use case we are > trying to optimize with this KIP. To determine the subset of segments which > would be deleted, we only read metadata for segments which would be deleted > via the listRemoteLogSegments(). But to determine the totalLogSize, which > is required every time retention logic based on size executes, we read > metadata of *all* the segments in remote storage. Hence, the number of > results returned by *RemoteLogMetadataManager.listRemoteLogSegments() *is > different when we are calculating totalLogSize vs. when we are determining > the subset of segments to delete. > > 3. > *"Also, what about time-based retention? To make that efficient, do we need > to make some additional interface changes?"*No. Note that time complexity > to determine the segments for retention is different for time based vs. > size based. For time based, the time complexity is a function of the number > of segments which are "eligible for deletion" (since we only read metadata > for segments which would be deleted) whereas in size based retention, the > time complexity is a function of "all segments" available in remote storage > (metadata of all segments needs to be read to calculate the total size). As > you may observe, this KIP will bring the time complexity for both time > based retention & size based retention to the same function. > > 4. Also, please note that this new API introduced in this KIP also enables > us to provide a metric for total size of data stored in remote storage. > Without the API, calculation of this metric will become very expensive with > *listRemoteLogSegments().* > I understand that your motivation here is to avoid polluting the interface > with optimization specific APIs and I will agree with that goal. But I > believe that this new API proposed in the KIP brings in significant > improvement and there is no other work around available to achieve the same > performance. > > Regards, > Divij Vaidya > > > > On Tue, Nov 8, 2022 at 12:12 AM Jun Rao <j...@confluent.io.invalid> wrote: > > > Hi, Divij, > > > > Thanks for the KIP. Sorry for the late reply. > > > > The motivation of the KIP is to improve the efficiency of size based > > retention. I am not sure the proposed changes are enough. For example, if > > the size exceeds the retention size, we need to determine the subset of > > segments to delete to bring the size within the retention limit. Do we > need > > to call RemoteLogMetadataManager.listRemoteLogSegments() to determine > that? > > Also, what about time-based retention? To make that efficient, do we need > > to make some additional interface changes? > > > > An alternative approach is for the RLMM implementor to make sure > > that RemoteLogMetadataManager.listRemoteLogSegments() is fast (e.g., with > > local caching). This way, we could keep the interface simple. Have we > > considered that? > > > > Thanks, > > > > Jun > > > > On Wed, Sep 28, 2022 at 6:28 AM Divij Vaidya <divijvaidy...@gmail.com> > > wrote: > > > > > Hey folks > > > > > > Does anyone else have any thoughts on this before I propose this for a > > > vote? > > > > > > -- > > > Divij Vaidya > > > > > > > > > > > > On Mon, Sep 5, 2022 at 12:57 PM Satish Duggana < > satish.dugg...@gmail.com > > > > > > wrote: > > > > > > > Thanks for the KIP Divij! > > > > > > > > This is a nice improvement to avoid recalculation of size. Customized > > > RLMMs > > > > can implement the best possible approach by caching or maintaining > the > > > size > > > > in an efficient way. But this is not a big concern for the default > > topic > > > > based RLMM as mentioned in the KIP. > > > > > > > > ~Satish. > > > > > > > > On Wed, 13 Jul 2022 at 18:48, Divij Vaidya <divijvaidy...@gmail.com> > > > > wrote: > > > > > > > > > Thank you for your review Luke. > > > > > > > > > > > Reg: is that would the new `RemoteLogSizeBytes` metric be a > > > performance > > > > > overhead? Although we move the calculation to a seperate API, we > > still > > > > > can't assume users will implement a light-weight method, right? > > > > > > > > > > This metric would be logged using the information that is already > > being > > > > > calculated for handling remote retention logic, hence, no > additional > > > work > > > > > is required to calculate this metric. More specifically, whenever > > > > > RemoteLogManager calls getRemoteLogSize API, this metric would be > > > > captured. > > > > > This API call is made every time RemoteLogManager wants to handle > > > expired > > > > > remote log segments (which should be periodic). Does that address > > your > > > > > concern? > > > > > > > > > > Divij Vaidya > > > > > > > > > > > > > > > > > > > > On Tue, Jul 12, 2022 at 11:01 AM Luke Chen <show...@gmail.com> > > wrote: > > > > > > > > > > > Hi Divij, > > > > > > > > > > > > Thanks for the KIP! > > > > > > > > > > > > I think it makes sense to delegate the responsibility of > > calculation > > > to > > > > > the > > > > > > specific RemoteLogMetadataManager implementation. > > > > > > But one thing I'm not quite sure, is that would the new > > > > > > `RemoteLogSizeBytes` metric be a performance overhead? > > > > > > Although we move the calculation to a seperate API, we still > can't > > > > assume > > > > > > users will implement a light-weight method, right? > > > > > > > > > > > > Thank you. > > > > > > Luke > > > > > > > > > > > > On Fri, Jul 1, 2022 at 5:47 PM Divij Vaidya < > > divijvaidy...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier > > > > > > > > > > > > > > > > > > > > > Hey folks > > > > > > > > > > > > > > Please take a look at this KIP which proposes an extension to > > > > KIP-405. > > > > > > This > > > > > > > is my first KIP with Apache Kafka community so any feedback > would > > > be > > > > > > highly > > > > > > > appreciated. > > > > > > > > > > > > > > Cheers! > > > > > > > > > > > > > > -- > > > > > > > Divij Vaidya > > > > > > > Sr. Software Engineer > > > > > > > Amazon > > > > > > > > > > > > > > > > > > > > > > > > > > > >