Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

Divij Vaidya Tue, 14 Feb 2023 07:17:13 -0800

Hey Jun

It has been a while since this KIP got some attention. While we wait for
Satish to chime in here, perhaps I can answer your question.


> Could you explain how you exposed the log size in your KIP-405
implementation?

The APIs available in RLMM as per KIP405
are, addRemoteLogSegmentMetadata(), updateRemoteLogSegmentMetadata(),
remoteLogSegmentMetadata(), highestOffsetForEpoch(),
putRemotePartitionDeleteMetadata(), listRemoteLogSegments(),
onPartitionLeadershipChanges()
and onStopPartitions(). None of these APIs allow us to expose the log size,
hence, the only option that remains is to list all segments using
listRemoteLogSegments() and aggregate them every time we require to
calculate the size. Based on our prior discussion, this requires reading
all segment metadata which won't work for non-local RLMM implementations.
Satish's implementation also performs a full scan and calculates the
aggregate. see:
https://github.com/satishd/kafka/blob/2.8.x-tiered-storage/core/src/main/scala/kafka/log/remote/RemoteLogManager.scala#L619


Does this answer your question?

--
Divij Vaidya



On Tue, Dec 20, 2022 at 8:40 PM Jun Rao <j...@confluent.io.invalid> wrote:

> Hi, Divij,
>
> Thanks for the explanation.
>
> Good question.
>
> Hi, Satish,
>
> Could you explain how you exposed the log size in your KIP-405
> implementation?
>
> Thanks,
>
> Jun
>
> On Tue, Dec 20, 2022 at 4:59 AM Divij Vaidya <divijvaidy...@gmail.com>
> wrote:
>
> > Hey Jun
> >
> > Yes, it is possible to maintain the log size in the cache (see rejected
> > alternative#3 in the KIP) but I did not understand how it is possible to
> > retrieve it without the new API. The log size could be calculated on
> > startup by scanning through the segments (though I would disagree that
> this
> > is the right approach since scanning itself takes order of minutes and
> > hence delay the start of archive process), and incrementally maintained
> > afterwards, even then, we would need an API in RemoteLogMetadataManager
> so
> > that RLM could fetch the cached size!
> >
> > If we wish to cache the size without adding a new API, then we need to
> > cache the size in RLM itself (instead of RLMM implementation) and
> > incrementally manage it. The downside of longer archive time at startup
> > (due to initial scale) still remains valid in this situation.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Fri, Dec 16, 2022 at 12:43 AM Jun Rao <j...@confluent.io.invalid>
> wrote:
> >
> > > Hi, Divij,
> > >
> > > Thanks for the explanation.
> > >
> > > If there is in-memory cache, could we maintain the log size in the
> cache
> > > with the existing API? For example, a replica could make a
> > > listRemoteLogSegments(TopicIdPartition topicIdPartition) call on
> startup
> > to
> > > get the remote segment size before the current leaderEpoch. The leader
> > > could then maintain the size incrementally afterwards. On leader
> change,
> > > other replicas can make a listRemoteLogSegments(TopicIdPartition
> > > topicIdPartition, int leaderEpoch) call to get the size of newly
> > generated
> > > segments.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Dec 14, 2022 at 3:27 AM Divij Vaidya <divijvaidy...@gmail.com>
> > > wrote:
> > >
> > > > > Is the new method enough for doing size-based retention?
> > > >
> > > > Yes. You are right in assuming that this API only provides the Remote
> > > > storage size (for current epoch chain). We would use this API for
> size
> > > > based retention along with a value of localOnlyLogSegmentSize which
> is
> > > > computed as Log.sizeInBytes(logSegments.filter(_.baseOffset >
> > > > highestOffsetWithRemoteIndex)). Hence, (total_log_size =
> > > > remoteLogSizeBytes + log.localOnlyLogSegmentSize). I have updated the
> > KIP
> > > > with this information. You can also check an example implementation
> at
> > > >
> > > >
> > >
> >
> https://github.com/satishd/kafka/blob/2.8.x-tiered-storage/core/src/main/scala/kafka/log/Log.scala#L2077
> > > >
> > > >
> > > > > Do you imagine all accesses to remote metadata will be across the
> > > network
> > > > or will there be some local in-memory cache?
> > > >
> > > > I would expect a disk-less implementation to maintain a finite
> > in-memory
> > > > cache for segment metadata to optimize the number of network calls
> made
> > > to
> > > > fetch the data. In future, we can think about bringing this finite
> size
> > > > cache into RLM itself but that's probably a conversation for a
> > different
> > > > KIP. There are many other things we would like to do to optimize the
> > > Tiered
> > > > storage interface such as introducing a circular buffer / streaming
> > > > interface from RSM (so that we don't have to wait to fetch the entire
> > > > segment before starting to send records to the consumer), caching the
> > > > segments fetched from RSM locally (I would assume all RSM plugin
> > > > implementations to do this, might as well add it to RLM) etc.
> > > >
> > > > --
> > > > Divij Vaidya
> > > >
> > > >
> > > >
> > > > On Mon, Dec 12, 2022 at 7:35 PM Jun Rao <j...@confluent.io.invalid>
> > > wrote:
> > > >
> > > > > Hi, Divij,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > Is the new method enough for doing size-based retention? It gives
> the
> > > > total
> > > > > size of the remote segments, but it seems that we still don't know
> > the
> > > > > exact total size for a log since there could be overlapping
> segments
> > > > > between the remote and the local segments.
> > > > >
> > > > > You mentioned a disk-less implementation. Do you imagine all
> accesses
> > > to
> > > > > remote metadata will be across the network or will there be some
> > local
> > > > > in-memory cache?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Dec 7, 2022 at 3:10 AM Divij Vaidya <
> divijvaidy...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > The method is needed for RLMM implementations which fetch the
> > > > information
> > > > > > over the network and not for the disk based implementations (such
> > as
> > > > the
> > > > > > default topic based RLMM).
> > > > > >
> > > > > > I would argue that adding this API makes the interface more
> generic
> > > > than
> > > > > > what it is today. This is because, with the current APIs an
> > > implementor
> > > > > is
> > > > > > restricted to use disk based RLMM solutions only (i.e. the
> default
> > > > > > solution) whereas if we add this new API, we unblock usage of
> > network
> > > > > based
> > > > > > RLMM implementations such as databases.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed 30. Nov 2022 at 20:40, Jun Rao <j...@confluent.io.invalid>
> > > > wrote:
> > > > > >
> > > > > > > Hi, Divij,
> > > > > > >
> > > > > > > Thanks for the reply.
> > > > > > >
> > > > > > > Point#2. My high level question is that is the new method
> needed
> > > for
> > > > > > every
> > > > > > > implementation of remote storage or just for a specific
> > > > implementation.
> > > > > > The
> > > > > > > issues that you pointed out exist for the default
> implementation
> > of
> > > > > RLMM
> > > > > > as
> > > > > > > well and so far, the default implementation hasn't found a need
> > > for a
> > > > > > > similar new method. For public interface, ideally we want to
> make
> > > it
> > > > > more
> > > > > > > general.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > > On Mon, Nov 21, 2022 at 7:11 AM Divij Vaidya <
> > > > divijvaidy...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thank you Jun and Alex for your comments.
> > > > > > > >
> > > > > > > > Point#1: You are right Jun. As Alex mentioned, the "derived
> > > > metadata"
> > > > > > can
> > > > > > > > increase the size of cached metadata by a factor of 10 but it
> > > > should
> > > > > be
> > > > > > > ok
> > > > > > > > to cache just the actual metadata. My point about size being
> a
> > > > > > limitation
> > > > > > > > for using cache is not valid anymore.
> > > > > > > >
> > > > > > > > Point#2: For a new replica, it would still have to fetch the
> > > > metadata
> > > > > > > over
> > > > > > > > the network to initiate the warm up of the cache and hence,
> > > > increase
> > > > > > the
> > > > > > > > start time of the archival process. Please also note the
> > > > > repercussions
> > > > > > of
> > > > > > > > the warm up scan that Alex mentioned in this thread as part
> of
> > > > > #102.2.
> > > > > > > >
> > > > > > > > 100#: Agreed Alex. Thanks for clarifying that. My point about
> > > size
> > > > > > being
> > > > > > > a
> > > > > > > > limitation for using cache is not valid anymore.
> > > > > > > >
> > > > > > > > 101#: Alex, if I understand correctly, you are suggesting to
> > > cache
> > > > > the
> > > > > > > > total size at the leader and update it on archival. This
> > wouldn't
> > > > > work
> > > > > > > for
> > > > > > > > cases when the leader restarts where we would have to make a
> > full
> > > > > scan
> > > > > > > > to update the total size entry on startup. We expect users to
> > > store
> > > > > > data
> > > > > > > > over longer duration in remote storage which increases the
> > > > likelihood
> > > > > > of
> > > > > > > > leader restarts / failovers.
> > > > > > > >
> > > > > > > > 102#.1: I don't think that the current design accommodates
> the
> > > fact
> > > > > > that
> > > > > > > > data corruption could happen at the RLMM plugin (we don't
> have
> > > > > checksum
> > > > > > > as
> > > > > > > > a field in metadata as part of KIP405). If data corruption
> > > occurs,
> > > > w/
> > > > > > or
> > > > > > > > w/o the cache, it would be a different problem to solve. I
> > would
> > > > like
> > > > > > to
> > > > > > > > keep this outside the scope of this KIP.
> > > > > > > >
> > > > > > > > 102#.2: Agree. This remains as the main concern for using the
> > > cache
> > > > > to
> > > > > > > > fetch total size.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Divij Vaidya
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Nov 18, 2022 at 12:59 PM Alexandre Dupriez <
> > > > > > > > alexandre.dupr...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hi Divij,
> > > > > > > > >
> > > > > > > > > Thanks for the KIP. Please find some comments based on
> what I
> > > > read
> > > > > on
> > > > > > > > > this thread so far - apologies for the repeats and the late
> > > > reply.
> > > > > > > > >
> > > > > > > > > If I understand correctly, one of the main elements of
> > > discussion
> > > > > is
> > > > > > > > > about caching in Kafka versus delegation of providing the
> > > remote
> > > > > size
> > > > > > > > > of a topic-partition to the plugin.
> > > > > > > > >
> > > > > > > > > A few comments:
> > > > > > > > >
> > > > > > > > > 100. The size of the “derived metadata” which is managed by
> > the
> > > > > > plugin
> > > > > > > > > to represent an rlmMetadata can indeed be close to 1 kB on
> > > > average
> > > > > > > > > depending on its own internal structure, e.g. the
> redundancy
> > it
> > > > > > > > > enforces (unfortunately resulting to duplication),
> additional
> > > > > > > > > information such as checksums and primary and secondary
> > > indexable
> > > > > > > > > keys. But indeed, the rlmMetadata is itself a lighter data
> > > > > structure
> > > > > > > > > by a factor of 10. And indeed, instead of caching the
> > “derived
> > > > > > > > > metadata”, only the rlmMetadata could be, which should
> > address
> > > > the
> > > > > > > > > concern regarding the memory occupancy of the cache.
> > > > > > > > >
> > > > > > > > > 101. I am not sure I fully understand why we would need to
> > > cache
> > > > > the
> > > > > > > > > list of rlmMetadata to retain the remote size of a
> > > > topic-partition.
> > > > > > > > > Since the leader of a topic-partition is, in
> non-degenerated
> > > > cases,
> > > > > > > > > the only actor which can mutate the remote part of the
> > > > > > > > > topic-partition, hence its size, it could in theory only
> > cache
> > > > the
> > > > > > > > > size of the remote log once it has calculated it? In which
> > case
> > > > > there
> > > > > > > > > would not be any problem regarding the size of the caching
> > > > > strategy.
> > > > > > > > > Did I miss something there?
> > > > > > > > >
> > > > > > > > > 102. There may be a few challenges to consider with
> caching:
> > > > > > > > >
> > > > > > > > > 102.1) As mentioned above, the caching strategy assumes no
> > > > mutation
> > > > > > > > > outside the lifetime of a leader. While this is true in the
> > > > normal
> > > > > > > > > course of operation, there could be accidental mutation
> > outside
> > > > of
> > > > > > the
> > > > > > > > > leader and a loss of consistency between the cached state
> and
> > > the
> > > > > > > > > actual remote representation of the log. E.g. split-brain
> > > > > scenarios,
> > > > > > > > > bugs in the plugins, bugs in external systems with mutating
> > > > access
> > > > > on
> > > > > > > > > the derived metadata. In the worst case, a drift between
> the
> > > > cached
> > > > > > > > > size and the actual size could lead to over-deleting remote
> > > data
> > > > > > which
> > > > > > > > > is a durability risk.
> > > > > > > > >
> > > > > > > > > The alternative you propose, by making the plugin the
> source
> > of
> > > > > truth
> > > > > > > > > w.r.t. to the size of the remote log, can make it easier to
> > > avoid
> > > > > > > > > inconsistencies between plugin-managed metadata and the
> > remote
> > > > log
> > > > > > > > > from the perspective of Kafka. On the other hand, plugin
> > > vendors
> > > > > > would
> > > > > > > > > have to implement it with the expected efficiency to have
> it
> > > > yield
> > > > > > > > > benefits.
> > > > > > > > >
> > > > > > > > > 102.2) As you mentioned, the caching strategy in Kafka
> would
> > > > still
> > > > > > > > > require one iteration over the list of rlmMetadata when the
> > > > > > leadership
> > > > > > > > > of a topic-partition is assigned to a broker, while the
> > plugin
> > > > can
> > > > > > > > > offer alternative constant-time approaches. This
> calculation
> > > > cannot
> > > > > > be
> > > > > > > > > put on the LeaderAndIsr path and would be performed in the
> > > > > > background.
> > > > > > > > > In case of bulk leadership migration, listing the
> rlmMetadata
> > > > could
> > > > > > a)
> > > > > > > > > result in request bursts to any backend system the plugin
> may
> > > use
> > > > > > > > > [which shouldn’t be a problem for high-throughput data
> stores
> > > but
> > > > > > > > > could have cost implications] b) increase utilisation
> > timespan
> > > of
> > > > > the
> > > > > > > > > RLM threads for these calculations potentially leading to
> > > > transient
> > > > > > > > > starvation of tasks queued for, typically, offloading
> > > operations
> > > > c)
> > > > > > > > > could have a non-marginal CPU footprint on hardware with
> > strict
> > > > > > > > > resource constraints. All these elements could have an
> impact
> > > to
> > > > > some
> > > > > > > > > degree depending on the operational environment.
> > > > > > > > >
> > > > > > > > > From a design perspective, one question is where we want
> the
> > > > source
> > > > > > of
> > > > > > > > > truth w.r.t. remote log size to be during the lifetime of a
> > > > leader.
> > > > > > > > > The responsibility of maintaining a consistent
> representation
> > > of
> > > > > the
> > > > > > > > > remote log is shared by Kafka and the plugin. Which system
> is
> > > > best
> > > > > > > > > placed to maintain such a state while providing the highest
> > > > > > > > > consistency guarantees is something both Kafka and plugin
> > > > designers
> > > > > > > > > could help understand better.
> > > > > > > > >
> > > > > > > > > Many thanks,
> > > > > > > > > Alexandre
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Le jeu. 17 nov. 2022 à 19:27, Jun Rao
> > <j...@confluent.io.invalid
> > > >
> > > > a
> > > > > > > > écrit :
> > > > > > > > > >
> > > > > > > > > > Hi, Divij,
> > > > > > > > > >
> > > > > > > > > > Thanks for the reply.
> > > > > > > > > >
> > > > > > > > > > Point #1. Is the average remote segment metadata really
> > 1KB?
> > > > > What's
> > > > > > > > > listed
> > > > > > > > > > in the public interface is probably well below 100 bytes.
> > > > > > > > > >
> > > > > > > > > > Point #2. I guess you are assuming that each broker only
> > > caches
> > > > > the
> > > > > > > > > remote
> > > > > > > > > > segment metadata in memory. An alternative approach is to
> > > cache
> > > > > > them
> > > > > > > in
> > > > > > > > > > both memory and local disk. That way, on broker restart,
> > you
> > > > just
> > > > > > > need
> > > > > > > > to
> > > > > > > > > > fetch the new remote segments' metadata using the
> > > > > > > > > > listRemoteLogSegments(TopicIdPartition topicIdPartition,
> > int
> > > > > > > > leaderEpoch)
> > > > > > > > > > api. Will that work?
> > > > > > > > > >
> > > > > > > > > > Point #3. Thanks for the explanation and it sounds good.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jun
> > > > > > > > > >
> > > > > > > > > > On Thu, Nov 17, 2022 at 7:31 AM Divij Vaidya <
> > > > > > > divijvaidy...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Jun
> > > > > > > > > > >
> > > > > > > > > > > There are three points that I would like to present
> here:
> > > > > > > > > > >
> > > > > > > > > > > 1. We would require a large cache size to efficiently
> > cache
> > > > all
> > > > > > > > segment
> > > > > > > > > > > metadata.
> > > > > > > > > > > 2. Linear scan of all metadata at broker startup to
> > > populate
> > > > > the
> > > > > > > > cache
> > > > > > > > > will
> > > > > > > > > > > be slow and will impact the archival process.
> > > > > > > > > > > 3. There is no other use case where a full scan of
> > segment
> > > > > > metadata
> > > > > > > > is
> > > > > > > > > > > required.
> > > > > > > > > > >
> > > > > > > > > > > Let's start by quantifying 1. Here's my estimate for
> the
> > > size
> > > > > of
> > > > > > > the
> > > > > > > > > cache.
> > > > > > > > > > > Average size of segment metadata = 1KB. This could be
> > more
> > > if
> > > > > we
> > > > > > > have
> > > > > > > > > > > frequent leader failover with a large number of leader
> > > epochs
> > > > > > being
> > > > > > > > > stored
> > > > > > > > > > > per segment.
> > > > > > > > > > > Segment size = 100MB. Users will prefer to reduce the
> > > segment
> > > > > > size
> > > > > > > > > from the
> > > > > > > > > > > default value of 1GB to ensure timely archival of data
> > > since
> > > > > data
> > > > > > > > from
> > > > > > > > > > > active segment is not archived.
> > > > > > > > > > > Cache size = num segments * avg. segment metadata size
> =
> > > > > > > > > (100TB/100MB)*1KB
> > > > > > > > > > > = 1GB.
> > > > > > > > > > > While 1GB for cache may not sound like a large number
> for
> > > > > larger
> > > > > > > > > machines,
> > > > > > > > > > > it does eat into the memory as an additional cache and
> > > makes
> > > > > use
> > > > > > > > cases
> > > > > > > > > with
> > > > > > > > > > > large data retention with low throughout expensive
> (where
> > > > such
> > > > > > use
> > > > > > > > case
> > > > > > > > > > > would could use smaller machines).
> > > > > > > > > > >
> > > > > > > > > > > About point#2:
> > > > > > > > > > > Even if we say that all segment metadata can fit into
> the
> > > > > cache,
> > > > > > we
> > > > > > > > > will
> > > > > > > > > > > need to populate the cache on broker startup. It would
> > not
> > > be
> > > > > in
> > > > > > > the
> > > > > > > > > > > critical patch of broker startup and hence won't impact
> > the
> > > > > > startup
> > > > > > > > > time.
> > > > > > > > > > > But it will impact the time when we could start the
> > > archival
> > > > > > > process
> > > > > > > > > since
> > > > > > > > > > > the RLM thread pool will be blocked on the first call
> to
> > > > > > > > > > > listRemoteLogSegments(). To scan metadata for 1MM
> > segments
> > > > > > > (computed
> > > > > > > > > above)
> > > > > > > > > > > and transfer 1GB data over the network from a RLMM such
> > as
> > > a
> > > > > > remote
> > > > > > > > > > > database would be in the order of minutes (depending on
> > how
> > > > > > > efficient
> > > > > > > > > the
> > > > > > > > > > > scan is with the RLMM implementation). Although, I
> would
> > > > > concede
> > > > > > > that
> > > > > > > > > > > having RLM threads blocked for a few minutes is perhaps
> > OK
> > > > but
> > > > > if
> > > > > > > we
> > > > > > > > > > > introduce the new API proposed in the KIP, we would
> have
> > a
> > > > > > > > > > > deterministic startup time for RLM. Adding the API
> comes
> > > at a
> > > > > low
> > > > > > > > cost
> > > > > > > > > and
> > > > > > > > > > > I believe the trade off is worth it.
> > > > > > > > > > >
> > > > > > > > > > > About point#3:
> > > > > > > > > > > We can use listRemoteLogSegments(TopicIdPartition
> > > > > > topicIdPartition,
> > > > > > > > int
> > > > > > > > > > > leaderEpoch) to calculate the segments eligible for
> > > deletion
> > > > > > (based
> > > > > > > > on
> > > > > > > > > size
> > > > > > > > > > > retention) where leader epoch(s) belong to the current
> > > leader
> > > > > > epoch
> > > > > > > > > chain.
> > > > > > > > > > > I understand that it may lead to segments belonging to
> > > other
> > > > > > epoch
> > > > > > > > > lineage
> > > > > > > > > > > not getting deleted and would require a separate
> > mechanism
> > > to
> > > > > > > delete
> > > > > > > > > them.
> > > > > > > > > > > The separate mechanism would anyways be required to
> > delete
> > > > > these
> > > > > > > > > "leaked"
> > > > > > > > > > > segments as there are other cases which could lead to
> > leaks
> > > > > such
> > > > > > as
> > > > > > > > > network
> > > > > > > > > > > problems with RSM mid way writing through. segment etc.
> > > > > > > > > > >
> > > > > > > > > > > Thank you for the replies so far. They have made me
> > > re-think
> > > > my
> > > > > > > > > assumptions
> > > > > > > > > > > and this dialogue has been very constructive for me.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Divij Vaidya
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Nov 10, 2022 at 10:49 PM Jun Rao
> > > > > > <j...@confluent.io.invalid
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi, Divij,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the reply.
> > > > > > > > > > > >
> > > > > > > > > > > > It's true that the data in Kafka could be kept longer
> > > with
> > > > > > > KIP-405.
> > > > > > > > > How
> > > > > > > > > > > > much data do you envision to have per broker? For
> 100TB
> > > > data
> > > > > > per
> > > > > > > > > broker,
> > > > > > > > > > > > with 1GB segment and segment metadata of 100 bytes,
> it
> > > > > requires
> > > > > > > > > > > > 100TB/1GB*100 = 10MB, which should fit in memory.
> > > > > > > > > > > >
> > > > > > > > > > > > RemoteLogMetadataManager has two
> > listRemoteLogSegments()
> > > > > > methods.
> > > > > > > > > The one
> > > > > > > > > > > > you listed listRemoteLogSegments(TopicIdPartition
> > > > > > > topicIdPartition,
> > > > > > > > > int
> > > > > > > > > > > > leaderEpoch) does return data in offset order.
> However,
> > > the
> > > > > > other
> > > > > > > > > > > > one listRemoteLogSegments(TopicIdPartition
> > > > topicIdPartition)
> > > > > > > > doesn't
> > > > > > > > > > > > specify the return order. I assume that you need the
> > > latter
> > > > > to
> > > > > > > > > calculate
> > > > > > > > > > > > the segment size?
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Jun
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Nov 10, 2022 at 10:25 AM Divij Vaidya <
> > > > > > > > > divijvaidy...@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > *Jun,*
> > > > > > > > > > > > >
> > > > > > > > > > > > > *"the default implementation of RLMM does local
> > > caching,
> > > > > > > right?"*
> > > > > > > > > > > > > Yes, Jun. The default implementation of RLMM does
> > > indeed
> > > > > > cache
> > > > > > > > the
> > > > > > > > > > > > segment
> > > > > > > > > > > > > metadata today, hence, it won't work for use cases
> > when
> > > > the
> > > > > > > > number
> > > > > > > > > of
> > > > > > > > > > > > > segments in remote storage is large enough to
> exceed
> > > the
> > > > > size
> > > > > > > of
> > > > > > > > > cache.
> > > > > > > > > > > > As
> > > > > > > > > > > > > part of this KIP, I will implement the new proposed
> > API
> > > > in
> > > > > > the
> > > > > > > > > default
> > > > > > > > > > > > > implementation of RLMM but the underlying
> > > implementation
> > > > > will
> > > > > > > > > still be
> > > > > > > > > > > a
> > > > > > > > > > > > > scan. I will pick up optimizing that in a separate
> > PR.
> > > > > > > > > > > > >
> > > > > > > > > > > > > *"we also cache all segment metadata in the brokers
> > > > without
> > > > > > > > > KIP-405. Do
> > > > > > > > > > > > you
> > > > > > > > > > > > > see a need to change that?"*
> > > > > > > > > > > > > Please correct me if I am wrong here but we cache
> > > > metadata
> > > > > > for
> > > > > > > > > segments
> > > > > > > > > > > > > "residing in local storage". The size of the
> current
> > > > cache
> > > > > > > works
> > > > > > > > > fine
> > > > > > > > > > > for
> > > > > > > > > > > > > the scale of the number of segments that we expect
> to
> > > > store
> > > > > > in
> > > > > > > > > local
> > > > > > > > > > > > > storage. After KIP-405, that cache will continue to
> > > store
> > > > > > > > metadata
> > > > > > > > > for
> > > > > > > > > > > > > segments which are residing in local storage and
> > hence,
> > > > we
> > > > > > > don't
> > > > > > > > > need
> > > > > > > > > > > to
> > > > > > > > > > > > > change that. For segments which have been offloaded
> > to
> > > > > remote
> > > > > > > > > storage,
> > > > > > > > > > > it
> > > > > > > > > > > > > would rely on RLMM. Note that the scale of data
> > stored
> > > in
> > > > > > RLMM
> > > > > > > is
> > > > > > > > > > > > different
> > > > > > > > > > > > > from local cache because the number of segments is
> > > > expected
> > > > > > to
> > > > > > > be
> > > > > > > > > much
> > > > > > > > > > > > > larger than what current implementation stores in
> > local
> > > > > > > storage.
> > > > > > > > > > > > >
> > > > > > > > > > > > > 2,3,4:
> > RemoteLogMetadataManager.listRemoteLogSegments()
> > > > > does
> > > > > > > > > specify
> > > > > > > > > > > the
> > > > > > > > > > > > > order i.e. it returns the segments sorted by first
> > > offset
> > > > > in
> > > > > > > > > ascending
> > > > > > > > > > > > > order. I am copying the API docs for KIP-405 here
> for
> > > > your
> > > > > > > > > reference
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > *Returns iterator of remote log segment metadata,
> > > sorted
> > > > by
> > > > > > > > {@link
> > > > > > > > > > > > > RemoteLogSegmentMetadata#startOffset()} inascending
> > > order
> > > > > > which
> > > > > > > > > > > contains
> > > > > > > > > > > > > the given leader epoch. This is used by remote log
> > > > > retention
> > > > > > > > > management
> > > > > > > > > > > > > subsystemto fetch the segment metadata for a given
> > > leader
> > > > > > > > > epoch.@param
> > > > > > > > > > > > > topicIdPartition topic partition@param leaderEpoch
> > > > > > leader
> > > > > > > > > > > > > epoch@return
> > > > > > > > > > > > > Iterator of remote segments, sorted by start offset
> > in
> > > > > > > ascending
> > > > > > > > > > > order. *
> > > > > > > > > > > > >
> > > > > > > > > > > > > *Luke,*
> > > > > > > > > > > > >
> > > > > > > > > > > > > 5. Note that we are trying to optimize the
> efficiency
> > > of
> > > > > size
> > > > > > > > based
> > > > > > > > > > > > > retention for remote storage. KIP-405 does not
> > > introduce
> > > > a
> > > > > > new
> > > > > > > > > config
> > > > > > > > > > > for
> > > > > > > > > > > > > periodically checking remote similar to
> > > > > > > > > > > log.retention.check.interval.ms
> > > > > > > > > > > > > which is applicable for remote storage. Hence, the
> > > metric
> > > > > > will
> > > > > > > be
> > > > > > > > > > > updated
> > > > > > > > > > > > > at the time of invoking log retention check for
> > remote
> > > > tier
> > > > > > > which
> > > > > > > > > is
> > > > > > > > > > > > > pending implementation today. We can perhaps come
> > back
> > > > and
> > > > > > > update
> > > > > > > > > the
> > > > > > > > > > > > > metric description after the implementation of log
> > > > > retention
> > > > > > > > check
> > > > > > > > > in
> > > > > > > > > > > > > RemoteLogManager.
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Divij Vaidya
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Nov 10, 2022 at 6:16 AM Luke Chen <
> > > > > show...@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Divij,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > One more question about the metric:
> > > > > > > > > > > > > > I think the metric will be updated when
> > > > > > > > > > > > > > (1) each time we run the log retention check
> (that
> > > is,
> > > > > > > > > > > > > > log.retention.check.interval.ms)
> > > > > > > > > > > > > > (2) When user explicitly call getRemoteLogSize
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Is that correct?
> > > > > > > > > > > > > > Maybe we should add a note in metric description,
> > > > > > otherwise,
> > > > > > > > when
> > > > > > > > > > > user
> > > > > > > > > > > > > got,
> > > > > > > > > > > > > > let's say 0 of RemoteLogSizeBytes, will be
> > surprised.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Otherwise, LGTM
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you for the KIP
> > > > > > > > > > > > > > Luke
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Nov 10, 2022 at 2:55 AM Jun Rao
> > > > > > > > <j...@confluent.io.invalid
> > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi, Divij,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for the explanation.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 1. Hmm, the default implementation of RLMM does
> > > local
> > > > > > > > caching,
> > > > > > > > > > > right?
> > > > > > > > > > > > > > > Currently, we also cache all segment metadata
> in
> > > the
> > > > > > > brokers
> > > > > > > > > > > without
> > > > > > > > > > > > > > > KIP-405. Do you see a need to change that?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2,3,4: Yes, your explanation makes sense.
> > However,
> > > > > > > > > > > > > > > currently,
> > > > > > RemoteLogMetadataManager.listRemoteLogSegments()
> > > > > > > > > doesn't
> > > > > > > > > > > > > > specify
> > > > > > > > > > > > > > > a particular order of the iterator. Do you
> intend
> > > to
> > > > > > change
> > > > > > > > > that?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Nov 8, 2022 at 3:31 AM Divij Vaidya <
> > > > > > > > > > > divijvaidy...@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hey Jun
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thank you for your comments.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > *1. "RLMM implementor could ensure that
> > > > > > > > > listRemoteLogSegments()
> > > > > > > > > > > is
> > > > > > > > > > > > > > fast"*
> > > > > > > > > > > > > > > > This would be ideal but pragmatically, it is
> > > > > difficult
> > > > > > to
> > > > > > > > > ensure
> > > > > > > > > > > > that
> > > > > > > > > > > > > > > > listRemoteLogSegments() is fast. This is
> > because
> > > of
> > > > > the
> > > > > > > > > > > possibility
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > large number of segments (much larger than
> what
> > > > Kafka
> > > > > > > > > currently
> > > > > > > > > > > > > handles
> > > > > > > > > > > > > > > > with local storage today) would make it
> > > infeasible
> > > > to
> > > > > > > adopt
> > > > > > > > > > > > > strategies
> > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > as local caching to improve the performance
> of
> > > > > > > > > > > > listRemoteLogSegments.
> > > > > > > > > > > > > > > Apart
> > > > > > > > > > > > > > > > from caching (which won't work due to size
> > > > > > limitations) I
> > > > > > > > > can't
> > > > > > > > > > > > think
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > other strategies which may eliminate the need
> > for
> > > > IO
> > > > > > > > > > > > > > > > operations proportional to the number of
> total
> > > > > > segments.
> > > > > > > > > Please
> > > > > > > > > > > > > advise
> > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > you have something in mind.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2.  "*If the size exceeds the retention size,
> > we
> > > > need
> > > > > > to
> > > > > > > > > > > determine
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > subset of segments to delete to bring the
> size
> > > > within
> > > > > > the
> > > > > > > > > > > retention
> > > > > > > > > > > > > > > limit.
> > > > > > > > > > > > > > > > Do we need to call
> > > > > > > > > > > RemoteLogMetadataManager.listRemoteLogSegments()
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > determine that?"*
> > > > > > > > > > > > > > > > Yes, we need to call listRemoteLogSegments()
> to
> > > > > > determine
> > > > > > > > > which
> > > > > > > > > > > > > > segments
> > > > > > > > > > > > > > > > should be deleted. But there is a difference
> > with
> > > > the
> > > > > > use
> > > > > > > > > case we
> > > > > > > > > > > > are
> > > > > > > > > > > > > > > > trying to optimize with this KIP. To
> determine
> > > the
> > > > > > subset
> > > > > > > > of
> > > > > > > > > > > > segments
> > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > would be deleted, we only read metadata for
> > > > segments
> > > > > > > which
> > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > > > > > deleted
> > > > > > > > > > > > > > > > via the listRemoteLogSegments(). But to
> > determine
> > > > the
> > > > > > > > > > > totalLogSize,
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > is required every time retention logic based
> on
> > > > size
> > > > > > > > > executes, we
> > > > > > > > > > > > > read
> > > > > > > > > > > > > > > > metadata of *all* the segments in remote
> > storage.
> > > > > > Hence,
> > > > > > > > the
> > > > > > > > > > > number
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > results returned by
> > > > > > > > > > > > *RemoteLogMetadataManager.listRemoteLogSegments()
> > > > > > > > > > > > > > *is
> > > > > > > > > > > > > > > > different when we are calculating
> totalLogSize
> > > vs.
> > > > > when
> > > > > > > we
> > > > > > > > > are
> > > > > > > > > > > > > > > determining
> > > > > > > > > > > > > > > > the subset of segments to delete.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 3.
> > > > > > > > > > > > > > > > *"Also, what about time-based retention? To
> > make
> > > > that
> > > > > > > > > efficient,
> > > > > > > > > > > do
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > to make some additional interface
> changes?"*No.
> > > > Note
> > > > > > that
> > > > > > > > > time
> > > > > > > > > > > > > > complexity
> > > > > > > > > > > > > > > > to determine the segments for retention is
> > > > different
> > > > > > for
> > > > > > > > time
> > > > > > > > > > > based
> > > > > > > > > > > > > vs.
> > > > > > > > > > > > > > > > size based. For time based, the time
> complexity
> > > is
> > > > a
> > > > > > > > > function of
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > number
> > > > > > > > > > > > > > > > of segments which are "eligible for deletion"
> > > > (since
> > > > > we
> > > > > > > > only
> > > > > > > > > read
> > > > > > > > > > > > > > > metadata
> > > > > > > > > > > > > > > > for segments which would be deleted) whereas
> in
> > > > size
> > > > > > > based
> > > > > > > > > > > > retention,
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > time complexity is a function of "all
> segments"
> > > > > > available
> > > > > > > > in
> > > > > > > > > > > remote
> > > > > > > > > > > > > > > storage
> > > > > > > > > > > > > > > > (metadata of all segments needs to be read to
> > > > > calculate
> > > > > > > the
> > > > > > > > > total
> > > > > > > > > > > > > > size).
> > > > > > > > > > > > > > > As
> > > > > > > > > > > > > > > > you may observe, this KIP will bring the time
> > > > > > complexity
> > > > > > > > for
> > > > > > > > > both
> > > > > > > > > > > > > time
> > > > > > > > > > > > > > > > based retention & size based retention to the
> > > same
> > > > > > > > function.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 4. Also, please note that this new API
> > introduced
> > > > in
> > > > > > this
> > > > > > > > KIP
> > > > > > > > > > > also
> > > > > > > > > > > > > > > enables
> > > > > > > > > > > > > > > > us to provide a metric for total size of data
> > > > stored
> > > > > in
> > > > > > > > > remote
> > > > > > > > > > > > > storage.
> > > > > > > > > > > > > > > > Without the API, calculation of this metric
> > will
> > > > > become
> > > > > > > > very
> > > > > > > > > > > > > expensive
> > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > *listRemoteLogSegments().*
> > > > > > > > > > > > > > > > I understand that your motivation here is to
> > > avoid
> > > > > > > > polluting
> > > > > > > > > the
> > > > > > > > > > > > > > > interface
> > > > > > > > > > > > > > > > with optimization specific APIs and I will
> > agree
> > > > with
> > > > > > > that
> > > > > > > > > goal.
> > > > > > > > > > > > But
> > > > > > > > > > > > > I
> > > > > > > > > > > > > > > > believe that this new API proposed in the KIP
> > > > brings
> > > > > in
> > > > > > > > > > > significant
> > > > > > > > > > > > > > > > improvement and there is no other work around
> > > > > available
> > > > > > > to
> > > > > > > > > > > achieve
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > performance.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > Divij Vaidya
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Nov 8, 2022 at 12:12 AM Jun Rao
> > > > > > > > > <j...@confluent.io.invalid
> > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi, Divij,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for the KIP. Sorry for the late
> reply.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The motivation of the KIP is to improve the
> > > > > > efficiency
> > > > > > > of
> > > > > > > > > size
> > > > > > > > > > > > > based
> > > > > > > > > > > > > > > > > retention. I am not sure the proposed
> changes
> > > are
> > > > > > > enough.
> > > > > > > > > For
> > > > > > > > > > > > > > example,
> > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > the size exceeds the retention size, we
> need
> > to
> > > > > > > determine
> > > > > > > > > the
> > > > > > > > > > > > > subset
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > segments to delete to bring the size within
> > the
> > > > > > > retention
> > > > > > > > > > > limit.
> > > > > > > > > > > > Do
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > to call
> > > > > > > RemoteLogMetadataManager.listRemoteLogSegments()
> > > > > > > > to
> > > > > > > > > > > > > determine
> > > > > > > > > > > > > > > > that?
> > > > > > > > > > > > > > > > > Also, what about time-based retention? To
> > make
> > > > that
> > > > > > > > > efficient,
> > > > > > > > > > > do
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > to make some additional interface changes?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > An alternative approach is for the RLMM
> > > > implementor
> > > > > > to
> > > > > > > > make
> > > > > > > > > > > sure
> > > > > > > > > > > > > > > > > that
> > > > > RemoteLogMetadataManager.listRemoteLogSegments()
> > > > > > > is
> > > > > > > > > fast
> > > > > > > > > > > > > (e.g.,
> > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > local caching). This way, we could keep the
> > > > > interface
> > > > > > > > > simple.
> > > > > > > > > > > > Have
> > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > considered that?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Sep 28, 2022 at 6:28 AM Divij
> Vaidya
> > <
> > > > > > > > > > > > > > divijvaidy...@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hey folks
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Does anyone else have any thoughts on
> this
> > > > > before I
> > > > > > > > > propose
> > > > > > > > > > > > this
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > vote?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > Divij Vaidya
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Mon, Sep 5, 2022 at 12:57 PM Satish
> > > Duggana
> > > > <
> > > > > > > > > > > > > > > > satish.dugg...@gmail.com
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for the KIP Divij!
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > This is a nice improvement to avoid
> > > > > recalculation
> > > > > > > of
> > > > > > > > > size.
> > > > > > > > > > > > > > > Customized
> > > > > > > > > > > > > > > > > > RLMMs
> > > > > > > > > > > > > > > > > > > can implement the best possible
> approach
> > by
> > > > > > caching
> > > > > > > > or
> > > > > > > > > > > > > > maintaining
> > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > size
> > > > > > > > > > > > > > > > > > > in an efficient way. But this is not a
> > big
> > > > > > concern
> > > > > > > > for
> > > > > > > > > the
> > > > > > > > > > > > > > default
> > > > > > > > > > > > > > > > > topic
> > > > > > > > > > > > > > > > > > > based RLMM as mentioned in the KIP.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > ~Satish.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, 13 Jul 2022 at 18:48, Divij
> > Vaidya
> > > <
> > > > > > > > > > > > > > > divijvaidy...@gmail.com>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thank you for your review Luke.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Reg: is that would the new
> > > > > > `RemoteLogSizeBytes`
> > > > > > > > > metric
> > > > > > > > > > > > be a
> > > > > > > > > > > > > > > > > > performance
> > > > > > > > > > > > > > > > > > > > overhead? Although we move the
> > > calculation
> > > > > to a
> > > > > > > > > seperate
> > > > > > > > > > > > API,
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > still
> > > > > > > > > > > > > > > > > > > > can't assume users will implement a
> > > > > > light-weight
> > > > > > > > > method,
> > > > > > > > > > > > > right?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > This metric would be logged using the
> > > > > > information
> > > > > > > > > that is
> > > > > > > > > > > > > > already
> > > > > > > > > > > > > > > > > being
> > > > > > > > > > > > > > > > > > > > calculated for handling remote
> > retention
> > > > > logic,
> > > > > > > > > hence, no
> > > > > > > > > > > > > > > > additional
> > > > > > > > > > > > > > > > > > work
> > > > > > > > > > > > > > > > > > > > is required to calculate this metric.
> > > More
> > > > > > > > > specifically,
> > > > > > > > > > > > > > whenever
> > > > > > > > > > > > > > > > > > > > RemoteLogManager calls
> getRemoteLogSize
> > > > API,
> > > > > > this
> > > > > > > > > metric
> > > > > > > > > > > > > would
> > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > captured.
> > > > > > > > > > > > > > > > > > > > This API call is made every time
> > > > > > RemoteLogManager
> > > > > > > > > wants
> > > > > > > > > > > to
> > > > > > > > > > > > > > handle
> > > > > > > > > > > > > > > > > > expired
> > > > > > > > > > > > > > > > > > > > remote log segments (which should be
> > > > > periodic).
> > > > > > > > Does
> > > > > > > > > that
> > > > > > > > > > > > > > address
> > > > > > > > > > > > > > > > > your
> > > > > > > > > > > > > > > > > > > > concern?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Divij Vaidya
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > On Tue, Jul 12, 2022 at 11:01 AM Luke
> > > Chen
> > > > <
> > > > > > > > > > > > > show...@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Hi Divij,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thanks for the KIP!
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I think it makes sense to delegate
> > the
> > > > > > > > > responsibility
> > > > > > > > > > > of
> > > > > > > > > > > > > > > > > calculation
> > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > specific RemoteLogMetadataManager
> > > > > > > implementation.
> > > > > > > > > > > > > > > > > > > > > But one thing I'm not quite sure,
> is
> > > that
> > > > > > would
> > > > > > > > > the new
> > > > > > > > > > > > > > > > > > > > > `RemoteLogSizeBytes` metric be a
> > > > > performance
> > > > > > > > > overhead?
> > > > > > > > > > > > > > > > > > > > > Although we move the calculation
> to a
> > > > > > seperate
> > > > > > > > > API, we
> > > > > > > > > > > > > still
> > > > > > > > > > > > > > > > can't
> > > > > > > > > > > > > > > > > > > assume
> > > > > > > > > > > > > > > > > > > > > users will implement a light-weight
> > > > method,
> > > > > > > > right?
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thank you.
> > > > > > > > > > > > > > > > > > > > > Luke
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Fri, Jul 1, 2022 at 5:47 PM
> Divij
> > > > > Vaidya <
> > > > > > > > > > > > > > > > > divijvaidy...@gmail.com
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Hey folks
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Please take a look at this KIP
> > which
> > > > > > proposes
> > > > > > > > an
> > > > > > > > > > > > > extension
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > KIP-405.
> > > > > > > > > > > > > > > > > > > > > This
> > > > > > > > > > > > > > > > > > > > > > is my first KIP with Apache Kafka
> > > > > community
> > > > > > > so
> > > > > > > > > any
> > > > > > > > > > > > > feedback
> > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > be
> > > > > > > > > > > > > > > > > > > > > highly
> > > > > > > > > > > > > > > > > > > > > > appreciated.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Cheers!
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > > > > > Divij Vaidya
> > > > > > > > > > > > > > > > > > > > > > Sr. Software Engineer
> > > > > > > > > > > > > > > > > > > > > > Amazon
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

Reply via email to