Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

Divij Vaidya Tue, 08 Nov 2022 03:30:40 -0800

Hey Jun

Thank you for your comments.

*1. "RLMM implementor could ensure that listRemoteLogSegments() is fast"*
This would be ideal but pragmatically, it is difficult to ensure that
listRemoteLogSegments() is fast. This is because of the possibility of a
large number of segments (much larger than what Kafka currently handles
with local storage today) would make it infeasible to adopt strategies such
as local caching to improve the performance of listRemoteLogSegments. Apart
from caching (which won't work due to size limitations) I can't think of
other strategies which may eliminate the need for IO
operations proportional to the number of total segments. Please advise if
you have something in mind.

2.  "*If the size exceeds the retention size, we need to determine the
subset of segments to delete to bring the size within the retention limit.
Do we need to call RemoteLogMetadataManager.listRemoteLogSegments() to
determine that?"*
Yes, we need to call listRemoteLogSegments() to determine which segments
should be deleted. But there is a difference with the use case we are
trying to optimize with this KIP. To determine the subset of segments which
would be deleted, we only read metadata for segments which would be deleted
via the listRemoteLogSegments(). But to determine the totalLogSize, which
is required every time retention logic based on size executes, we read
metadata of *all* the segments in remote storage. Hence, the number of
results returned by *RemoteLogMetadataManager.listRemoteLogSegments() *is
different when we are calculating totalLogSize vs. when we are determining
the subset of segments to delete.

3.
*"Also, what about time-based retention? To make that efficient, do we need
to make some additional interface changes?"*No. Note that time complexity
to determine the segments for retention is different for time based vs.
size based. For time based, the time complexity is a function of the number
of segments which are "eligible for deletion" (since we only read metadata
for segments which would be deleted) whereas in size based retention, the
time complexity is a function of "all segments" available in remote storage
(metadata of all segments needs to be read to calculate the total size). As
you may observe, this KIP will bring the time complexity for both time
based retention & size based retention to the same function.

4. Also, please note that this new API introduced in this KIP also enables
us to provide a metric for total size of data stored in remote storage.
Without the API, calculation of this metric will become very expensive with
*listRemoteLogSegments().*
I understand that your motivation here is to avoid polluting the interface
with optimization specific APIs and I will agree with that goal. But I
believe that this new API proposed in the KIP brings in significant
improvement and there is no other work around available to achieve the same
performance.

Regards,
Divij Vaidya

On Tue, Nov 8, 2022 at 12:12 AM Jun Rao <j...@confluent.io.invalid> wrote:

> Hi, Divij,
>
> Thanks for the KIP. Sorry for the late reply.
>
> The motivation of the KIP is to improve the efficiency of size based
> retention. I am not sure the proposed changes are enough. For example, if
> the size exceeds the retention size, we need to determine the subset of
> segments to delete to bring the size within the retention limit. Do we need
> to call RemoteLogMetadataManager.listRemoteLogSegments() to determine that?
> Also, what about time-based retention? To make that efficient, do we need
> to make some additional interface changes?
>
> An alternative approach is for the RLMM implementor to make sure
> that RemoteLogMetadataManager.listRemoteLogSegments() is fast (e.g., with
> local caching). This way, we could keep the interface simple. Have we
> considered that?
>
> Thanks,
>
> Jun
>
> On Wed, Sep 28, 2022 at 6:28 AM Divij Vaidya <divijvaidy...@gmail.com>
> wrote:
>
> > Hey folks
> >
> > Does anyone else have any thoughts on this before I propose this for a
> > vote?
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Mon, Sep 5, 2022 at 12:57 PM Satish Duggana <satish.dugg...@gmail.com
> >
> > wrote:
> >
> > > Thanks for the KIP Divij!
> > >
> > > This is a nice improvement to avoid recalculation of size. Customized
> > RLMMs
> > > can implement the best possible approach by caching or maintaining the
> > size
> > > in an efficient way. But this is not a big concern for the default
> topic
> > > based RLMM as mentioned in the KIP.
> > >
> > > ~Satish.
> > >
> > > On Wed, 13 Jul 2022 at 18:48, Divij Vaidya <divijvaidy...@gmail.com>
> > > wrote:
> > >
> > > > Thank you for your review Luke.
> > > >
> > > > > Reg: is that would the new `RemoteLogSizeBytes` metric be a
> > performance
> > > > overhead? Although we move the calculation to a seperate API, we
> still
> > > > can't assume users will implement a light-weight method, right?
> > > >
> > > > This metric would be logged using the information that is already
> being
> > > > calculated for handling remote retention logic, hence, no additional
> > work
> > > > is required to calculate this metric. More specifically, whenever
> > > > RemoteLogManager calls getRemoteLogSize API, this metric would be
> > > captured.
> > > > This API call is made every time RemoteLogManager wants to handle
> > expired
> > > > remote log segments (which should be periodic). Does that address
> your
> > > > concern?
> > > >
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Tue, Jul 12, 2022 at 11:01 AM Luke Chen <show...@gmail.com>
> wrote:
> > > >
> > > > > Hi Divij,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > I think it makes sense to delegate the responsibility of
> calculation
> > to
> > > > the
> > > > > specific RemoteLogMetadataManager implementation.
> > > > > But one thing I'm not quite sure, is that would the new
> > > > > `RemoteLogSizeBytes` metric be a performance overhead?
> > > > > Although we move the calculation to a seperate API, we still can't
> > > assume
> > > > > users will implement a light-weight method, right?
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Fri, Jul 1, 2022 at 5:47 PM Divij Vaidya <
> divijvaidy...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
> > > > > >
> > > > > >
> > > > > > Hey folks
> > > > > >
> > > > > > Please take a look at this KIP which proposes an extension to
> > > KIP-405.
> > > > > This
> > > > > > is my first KIP with Apache Kafka community so any feedback would
> > be
> > > > > highly
> > > > > > appreciated.
> > > > > >
> > > > > > Cheers!
> > > > > >
> > > > > > --
> > > > > > Divij Vaidya
> > > > > > Sr. Software Engineer
> > > > > > Amazon
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

Reply via email to