Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

Jun Rao Wed, 09 Nov 2022 10:55:50 -0800

Hi, Divij,

Thanks for the explanation.


1. Hmm, the default implementation of RLMM does local caching, right?
Currently, we also cache all segment metadata in the brokers without
KIP-405. Do you see a need to change that?

2,3,4: Yes, your explanation makes sense. However,
currently, RemoteLogMetadataManager.listRemoteLogSegments() doesn't specify
a particular order of the iterator. Do you intend to change that?

Thanks,

Jun

On Tue, Nov 8, 2022 at 3:31 AM Divij Vaidya <divijvaidy...@gmail.com> wrote:

> Hey Jun
>
> Thank you for your comments.
>
> *1. "RLMM implementor could ensure that listRemoteLogSegments() is fast"*
> This would be ideal but pragmatically, it is difficult to ensure that
> listRemoteLogSegments() is fast. This is because of the possibility of a
> large number of segments (much larger than what Kafka currently handles
> with local storage today) would make it infeasible to adopt strategies such
> as local caching to improve the performance of listRemoteLogSegments. Apart
> from caching (which won't work due to size limitations) I can't think of
> other strategies which may eliminate the need for IO
> operations proportional to the number of total segments. Please advise if
> you have something in mind.
>
> 2.  "*If the size exceeds the retention size, we need to determine the
> subset of segments to delete to bring the size within the retention limit.
> Do we need to call RemoteLogMetadataManager.listRemoteLogSegments() to
> determine that?"*
> Yes, we need to call listRemoteLogSegments() to determine which segments
> should be deleted. But there is a difference with the use case we are
> trying to optimize with this KIP. To determine the subset of segments which
> would be deleted, we only read metadata for segments which would be deleted
> via the listRemoteLogSegments(). But to determine the totalLogSize, which
> is required every time retention logic based on size executes, we read
> metadata of *all* the segments in remote storage. Hence, the number of
> results returned by *RemoteLogMetadataManager.listRemoteLogSegments() *is
> different when we are calculating totalLogSize vs. when we are determining
> the subset of segments to delete.
>
> 3.
> *"Also, what about time-based retention? To make that efficient, do we need
> to make some additional interface changes?"*No. Note that time complexity
> to determine the segments for retention is different for time based vs.
> size based. For time based, the time complexity is a function of the number
> of segments which are "eligible for deletion" (since we only read metadata
> for segments which would be deleted) whereas in size based retention, the
> time complexity is a function of "all segments" available in remote storage
> (metadata of all segments needs to be read to calculate the total size). As
> you may observe, this KIP will bring the time complexity for both time
> based retention & size based retention to the same function.
>
> 4. Also, please note that this new API introduced in this KIP also enables
> us to provide a metric for total size of data stored in remote storage.
> Without the API, calculation of this metric will become very expensive with
> *listRemoteLogSegments().*
> I understand that your motivation here is to avoid polluting the interface
> with optimization specific APIs and I will agree with that goal. But I
> believe that this new API proposed in the KIP brings in significant
> improvement and there is no other work around available to achieve the same
> performance.
>
> Regards,
> Divij Vaidya
>
>
>
> On Tue, Nov 8, 2022 at 12:12 AM Jun Rao <j...@confluent.io.invalid> wrote:
>
> > Hi, Divij,
> >
> > Thanks for the KIP. Sorry for the late reply.
> >
> > The motivation of the KIP is to improve the efficiency of size based
> > retention. I am not sure the proposed changes are enough. For example, if
> > the size exceeds the retention size, we need to determine the subset of
> > segments to delete to bring the size within the retention limit. Do we
> need
> > to call RemoteLogMetadataManager.listRemoteLogSegments() to determine
> that?
> > Also, what about time-based retention? To make that efficient, do we need
> > to make some additional interface changes?
> >
> > An alternative approach is for the RLMM implementor to make sure
> > that RemoteLogMetadataManager.listRemoteLogSegments() is fast (e.g., with
> > local caching). This way, we could keep the interface simple. Have we
> > considered that?
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Sep 28, 2022 at 6:28 AM Divij Vaidya <divijvaidy...@gmail.com>
> > wrote:
> >
> > > Hey folks
> > >
> > > Does anyone else have any thoughts on this before I propose this for a
> > > vote?
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Mon, Sep 5, 2022 at 12:57 PM Satish Duggana <
> satish.dugg...@gmail.com
> > >
> > > wrote:
> > >
> > > > Thanks for the KIP Divij!
> > > >
> > > > This is a nice improvement to avoid recalculation of size. Customized
> > > RLMMs
> > > > can implement the best possible approach by caching or maintaining
> the
> > > size
> > > > in an efficient way. But this is not a big concern for the default
> > topic
> > > > based RLMM as mentioned in the KIP.
> > > >
> > > > ~Satish.
> > > >
> > > > On Wed, 13 Jul 2022 at 18:48, Divij Vaidya <divijvaidy...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thank you for your review Luke.
> > > > >
> > > > > > Reg: is that would the new `RemoteLogSizeBytes` metric be a
> > > performance
> > > > > overhead? Although we move the calculation to a seperate API, we
> > still
> > > > > can't assume users will implement a light-weight method, right?
> > > > >
> > > > > This metric would be logged using the information that is already
> > being
> > > > > calculated for handling remote retention logic, hence, no
> additional
> > > work
> > > > > is required to calculate this metric. More specifically, whenever
> > > > > RemoteLogManager calls getRemoteLogSize API, this metric would be
> > > > captured.
> > > > > This API call is made every time RemoteLogManager wants to handle
> > > expired
> > > > > remote log segments (which should be periodic). Does that address
> > your
> > > > > concern?
> > > > >
> > > > > Divij Vaidya
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jul 12, 2022 at 11:01 AM Luke Chen <show...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Divij,
> > > > > >
> > > > > > Thanks for the KIP!
> > > > > >
> > > > > > I think it makes sense to delegate the responsibility of
> > calculation
> > > to
> > > > > the
> > > > > > specific RemoteLogMetadataManager implementation.
> > > > > > But one thing I'm not quite sure, is that would the new
> > > > > > `RemoteLogSizeBytes` metric be a performance overhead?
> > > > > > Although we move the calculation to a seperate API, we still
> can't
> > > > assume
> > > > > > users will implement a light-weight method, right?
> > > > > >
> > > > > > Thank you.
> > > > > > Luke
> > > > > >
> > > > > > On Fri, Jul 1, 2022 at 5:47 PM Divij Vaidya <
> > divijvaidy...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
> > > > > > >
> > > > > > >
> > > > > > > Hey folks
> > > > > > >
> > > > > > > Please take a look at this KIP which proposes an extension to
> > > > KIP-405.
> > > > > > This
> > > > > > > is my first KIP with Apache Kafka community so any feedback
> would
> > > be
> > > > > > highly
> > > > > > > appreciated.
> > > > > > >
> > > > > > > Cheers!
> > > > > > >
> > > > > > > --
> > > > > > > Divij Vaidya
> > > > > > > Sr. Software Engineer
> > > > > > > Amazon
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

Reply via email to