Hey team,

I have finished the code change for this KIP, and PR is here
https://github.com/apache/kafka/pull/22459. There are 2 commits in this PR,
the first commit includes all the source code change, the second commit
includes all the test changes.
I have adopted some advices from this email chain and have also updated the
KIP and code accordingly.

Would love to hear your thoughts.

Thanks,
Lijun Tong

jian fu <[email protected]> 于2026年4月11日周六 18:38写道:

> Hi Kamal and  Lijun Tong
>
> Maybe another approach (Not good solution) is when we sure about the work
> is complete. we send all the keys' to null to delete all the data.
> It mean the keys are:
> clusterid:topic:partition:offset: COPY_SEGMENT_FINISHED
> clusterid:topic:partition:offset:  DELETE_SEGMENT_STARTED
> clusterid:topic:partition:offset:  DELETE_SEGMENT_FINISHED
> ----------------
>
> This solution won't change the old payload in the message. Thus the
> solution is somehow strange.
> Or carry all complete message so that keep the latest maybe can work.
>
> BTW: There are four type message need to take care:
> RemoteLogSegmentMetadataRecord
> RemoteLogSegmentMetadataUpdateRecord
> RemotePartitionDeleteMetadataRecord
> RemoteLogSegmentMetadataSnapshotRecord
>
> Regards
> Jian
>
>
> Lijun Tong <[email protected]> 于2026年4月7日周二 08:35写道:
>
> > Hi Kamal,
> >
> >  I now see what you mean regarding the DELETE_SEGMENT_STARTED handling.
> >
> >   I was thinking that the broker's logic that currently ignores log
> segment
> > messages when COPY_SEGMENT_STARTED doesn't exist might need to be updated
> > for the
> >    new message format. Since the new message contains useful information
> > like endOffset, topicID, and partition, it could provide sufficient
> context
> > to help
> >   with the retry of deletion operations. This suggests the broker might
> not
> > need to ignore these messages solely because of the absence of
> >   COPY_SEGMENT_STARTED.
> >
> >   With the new key-based format, we would have enough metadata in the
> > DELETE_SEGMENT_STARTED event itself to:
> >   - Identify the remote segment location
> >   - Track and retry the deletion process
> >   - Prevent orphaned segments in remote storage
> >
> >   I believe this means that with the new keyed message format, we could
> > avoid orphaned remote segments that might occur when COPY_SEGMENT_STARTED
> > is cleaned
> >   by the retention policy. The DELETE_SEGMENT_STARTED event would carry
> all
> > the necessary information to complete the deletion, regardless of whether
> > the
> >   base metadata still exists.
> >
> >   For the old format messages, the same mechanism (ignoring when
> > COPY_SEGMENT_STARTED is missing) could still apply, as those messages
> lack
> > the necessary
> >   information for independent processing.
> >
> >   I'd appreciate your thoughts on this approach.
> >
> > Best,
> > Lijun Tong
> >
> >
> > Kamal Chandraprakash <[email protected]> 于2026年4月5日周日
> > 17:37写道:
> >
> > > Hi Lijun,
> > >
> > > Yes, we can get on a call to close out on this. My concern is that if
> the
> > > same key is maintained for a given segment metadata records,
> > > then the newer messages (COPY_FINSIHED, DELETE_STARTED) might
> > > override/compact the previous COPY_STARTED events.
> > > This is not about the old / new format of messages. Assume that all the
> > > messages in the topic are in the new format and contain keys.
> > >
> > > From
> > >
> >
> https://docs.confluent.io/kafka/design/log_compaction.html#topic-compaction
> > > :
> > >
> > > Topic compaction is a mechanism that allows you to retain the latest
> > value
> > > for each message key in a topic, while discarding older values. It
> > > guarantees that the latest value for each message key is always
> retained
> > > within the log of data contained in that topic, making it ideal for use
> > > cases such as restoring state after system failure or reloading caches
> > > after application restarts.
> > >
> > > Thanks,
> > > Kamal
> > >
> > > On Sun, Apr 5, 2026 at 11:37 PM Lijun Tong <[email protected]>
> > > wrote:
> > >
> > > > Hi Kamal,
> > > >
> > > > Thanks for raising this.
> > > >
> > > > Currently, only the existing version of
> > > > RemoteLogSegmentMetadataUpdateRecord
> > > > does not include those fields. We rely on the time-based retention
> > policy
> > > > for cleanup, and this does not impact the ability to rebuild the
> > > > RemoteLogMetadataCache.
> > > >
> > > > The cache reconstruction should still work correctly because it
> depends
> > > on
> > > > the value, and we have not removed any fields from the value.
> > > >
> > > > Regarding the scenario where there are 10 remote segments and the
> > > > __remote_log_metadata topic contains only COPY_SEGMENT_FINISHED
> events:
> > > the
> > > > COPY_SEGMENT_STARTED events will not be compacted in this case, since
> > > > messages with a null key are not subject to compaction.
> > > >
> > > > Once older-format messages are cleaned up by the time-based retention
> > > > policy and compaction is enabled, records with the same key will be
> > > > compacted asynchronously and correctly. Given this, I don’t believe
> we
> > > need
> > > > to introduce a separate key for COPY_SEGMENT_STARTED events.
> > > >
> > > > Happy to jump on a call if it’s easier to discuss further.
> > > >
> > > > Best,
> > > >
> > > > Lijun Tong
> > > >
> > > > Lijun Tong <[email protected]> 于2026年4月5日周日 10:56写道:
> > > >
> > > > > Hey Kamal,
> > > > >
> > > > > I am not very clear on what's the question you mentioned above, I
> am
> > > > > happy to jump to a call to discuss further, and I lived in PST time
> > > zone.
> > > > > Maybe we can meet online through google meet?
> > > > >
> > > > > Thanks,
> > > > > Lijun Tong
> > > > >
> > > > > Kamal Chandraprakash <[email protected]> 于2026年4月1日周三
> > > > > 02:38写道:
> > > > >
> > > > >> Hi Lijun,
> > > > >>
> > > > >> Thanks for the update! I'm still not clear on this.
> > > > >>
> > > > >> The RemoteLogSegmentMetadataUpdateRecord does not contain the
> below
> > > > fields
> > > > >> compared to RemoteLogSegmentMetadataRecord:
> > > > >>
> > > > >> - startOffset
> > > > >> - endOffset (will be added as a tagged field)
> > > > >> - MaxTimestampMs
> > > > >> - SegmentLeaderEpochs
> > > > >> - SegmentSizeInBytes and
> > > > >> - TxnIndexEmpty
> > > > >>
> > > > >> When a broker gets restarted, will it be able to rebuild
> > > > >> the RemoteLogMetadataCache? Assume that there are 10 remote
> > > > >> segments and the __remote_log_metadata topic contains only the
> > > > >> COPY_SEGMENT_FINISHED events; the COPY_SEGMENT_STARTED event
> > > > >> gets compacted as the key is the same.
> > > > >>
> > > > >> Do we need a separate key for the COPY_SEGMENT_STARTED event and
> > > another
> > > > >> key for the remaining states?
> > > > >>
> > > > >> Current key format: TopicIdPartition:EndOffset:BrokerLeaderEpoch
> > > > >> Proposed key format:
> > TopicIdPartition:EndOffset:BrokerLeaderEpoch:x/y
> > > > >> where
> > > > >> x denotes a identifier for COPY_SEGMENT_STARTED and y denote for
> all
> > > the
> > > > >> other events.
> > > > >>
> > > > >> Thanks,
> > > > >> Kamal
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Tue, Mar 31, 2026 at 8:23 AM Lijun Tong <
> [email protected]
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Kamal,
> > > > >> >
> > > > >> > The scenario you described only happened with the old version
> > > > >> > RemoteLogSegmentUpdateMetadata message, since the endOffset will
> > be
> > > > >> added
> > > > >> > in the new RemoteLogSegmentUpdateMetadata schema. For the
> existing
> > > > >> > RemoteLogSegmentUpdateMetadata messages, we rely on the time
> based
> > > > >> > retention policy to clean up. Does that make sense?
> > > > >> >
> > > > >> > Best,
> > > > >> > Lijun Tong
> > > > >> >
> > > > >> > Kamal Chandraprakash <[email protected]>
> > 于2026年3月30日周一
> > > > >> > 18:14写道:
> > > > >> >
> > > > >> > > Hi Lijun,
> > > > >> > >
> > > > >> > > RemoteLogSegmentUpdateMetadata event does not have all the
> > > > >> > > fields/attributes similar to RemoteLogSegmentMetadata event.
> > > > >> > >
> > > > >> > > Assume that after compaction, for a segment, we have only
> > > > >> > > COPY_SEGMENT_FINISHED records. How do you plan to retrieve the
> > > other
> > > > >> > fields
> > > > >> > > after broker restart?
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Kamal
> > > > >> > >
> > > > >> > > On Mon, Mar 30, 2026, 23:22 Lijun Tong <
> [email protected]
> > >
> > > > >> wrote:
> > > > >> > >
> > > > >> > > > Hi Kamal,
> > > > >> > > >
> > > > >> > > > Thanks for taking another look at the KIP.
> > > > >> > > > 1. I have removed the left-over line about using another new
> > > topic
> > > > >> from
> > > > >> > > the
> > > > >> > > > KIP.
> > > > >> > > > 2.
> > > > >> > > >
> > > > >> > > > > 2. Assume that the topic is enabled with compaction and
> only
> > > one
> > > > >> > event
> > > > >> > > is
> > > > >> > > > > maintained per segment. If there is a transient error in
> the
> > > > >> remote
> > > > >> > log
> > > > >> > > > > deletion,
> > > > >> > > > >     then the COPY_SEGMENT started / finished events might
> be
> > > > >> > compacted
> > > > >> > > by
> > > > >> > > > > the DELETE_SEGMENT_STARTED events. If the broker is
> > restarted
> > > > >> during
> > > > >> > > > >     this time, will there be dangling remote log segments?
> > > > >> Currently,
> > > > >> > > > > during restart, the broker discards the events if it does
> > not
> > > > see
> > > > >> the
> > > > >> > > > > COPY_SEGMENT_STARTED      events.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > I am glad you asked this question, I didn't mention this
> part
> > in
> > > > my
> > > > >> > > current
> > > > >> > > > design to avoid distractions from the main design, but I
> plan
> > to
> > > > add
> > > > >> > > > another background thread to clean up all the stale messages
> > by
> > > > >> > comparing
> > > > >> > > > the message's endOffset with the topic partition's log start
> > > > >> offset. I
> > > > >> > > > believe this would help remove all the dangling messages.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Lijun TOng
> > > > >> > > >
> > > > >> > > > Kamal Chandraprakash <[email protected]>
> > > > 于2026年3月29日周日
> > > > >> > > > 22:48写道:
> > > > >> > > >
> > > > >> > > > > Hi Lijun,
> > > > >> > > > >
> > > > >> > > > > Sorry for the late reply. Went over the KIP again. Overall
> > > LGTM.
> > > > >> Few
> > > > >> > > > > points:
> > > > >> > > > >
> > > > >> > > > > > This KIP aims to solve this issue through introducing
> > > another
> > > > >> > > compacted
> > > > >> > > > > topic for the brokers to bootstrap the state from
> > > > >> > > > >
> > > > >> > > > > 1. Shall we update the motivation section to mention that
> > > > another
> > > > >> > topic
> > > > >> > > > is
> > > > >> > > > > not introduced?
> > > > >> > > > > 2. Assume that the topic is enabled with compaction and
> only
> > > one
> > > > >> > event
> > > > >> > > is
> > > > >> > > > > maintained per segment. If there is a transient error in
> the
> > > > >> remote
> > > > >> > log
> > > > >> > > > > deletion,
> > > > >> > > > >     then the COPY_SEGMENT started / finished events might
> be
> > > > >> > compacted
> > > > >> > > by
> > > > >> > > > > the DELETE_SEGMENT_STARTED events. If the broker is
> > restarted
> > > > >> during
> > > > >> > > > >     this time, will there be dangling remote log segments?
> > > > >> Currently,
> > > > >> > > > > during restart, the broker discards the events if it does
> > not
> > > > see
> > > > >> the
> > > > >> > > > > COPY_SEGMENT_STARTED      events.
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > > Kamal
> > > > >> > > > >
> > > > >> > > > > On Thu, Mar 26, 2026 at 5:08 AM Lijun Tong <
> > > > >> [email protected]>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi,
> > > > >> > > > > >
> > > > >> > > > > > I have started a Vote thread for this KIP, considering
> all
> > > > >> > questions
> > > > >> > > > > raised
> > > > >> > > > > > so far have been answered. I am happy to continue the
> > > > >> discussion if
> > > > >> > > > > needed,
> > > > >> > > > > > otherwise, this is a friendly reminder on the vote for
> > this
> > > > KIP.
> > > > >> > > > > >
> > > > >> > > > > > Thanks,
> > > > >> > > > > > Lijun Tong
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > Lijun Tong <[email protected]> 于2026年1月19日周一
> > 17:59写道:
> > > > >> > > > > >
> > > > >> > > > > > > Hey Kamal,
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks for raising these questions. Here are my
> > responses
> > > to
> > > > >> your
> > > > >> > > > > > > questions:
> > > > >> > > > > > > Q1 and Q2:
> > > > >> > > > > > > I think both questions boil down to how to release
> this
> > > new
> > > > >> > > feature,
> > > > >> > > > > both
> > > > >> > > > > > > questions are valid concerns. The solution I have in
> > mind
> > > is
> > > > >> this
> > > > >> > > > > feature
> > > > >> > > > > > > is *gated by the metadata version*. The new tombstone
> > > > >> semantics
> > > > >> > and
> > > > >> > > > the
> > > > >> > > > > > > additional fields (for example in
> > > > >> RemoteLogSegmentUpdateRecord)
> > > > >> > are
> > > > >> > > > > only
> > > > >> > > > > > > enabled once the cluster metadata version is upgraded
> to
> > > the
> > > > >> > > version
> > > > >> > > > > that
> > > > >> > > > > > > introduces this feature. As long as the cluster
> metadata
> > > > >> version
> > > > >> > is
> > > > >> > > > not
> > > > >> > > > > > > bumped, the system will not produce tombstone records.
> > > > >> Therefore,
> > > > >> > > > > during
> > > > >> > > > > > > rolling upgrades (mixed 4.2/4.3 brokers), the feature
> > > > remains
> > > > >> > > > > effectively
> > > > >> > > > > > > disabled. Tombstones will only start being produced
> > after
> > > > the
> > > > >> > > > metadata
> > > > >> > > > > > > version is upgraded, at which point all brokers are
> > > already
> > > > >> > > required
> > > > >> > > > to
> > > > >> > > > > > > support the new behavior.
> > > > >> > > > > > >
> > > > >> > > > > > > Since Kafka does not support metadata version
> downgrades
> > > at
> > > > >> the
> > > > >> > > > moment,
> > > > >> > > > > > > once a metadata version that supports this feature is
> > > > >> enabled, it
> > > > >> > > > > cannot
> > > > >> > > > > > be
> > > > >> > > > > > > downgraded to a version that does not support it. I
> will
> > > add
> > > > >> > these
> > > > >> > > > > > details
> > > > >> > > > > > > to the KIP later.
> > > > >> > > > > > > Q3. This is an *editing mistake* in the KIP. Thanks
> for
> > > > >> pointing
> > > > >> > it
> > > > >> > > > > out —
> > > > >> > > > > > > the enum value has already been corrected in the
> latest
> > > > >> revision
> > > > >> > to
> > > > >> > > > > > remove
> > > > >> > > > > > > the unused placeholder and keep the state values
> > > contiguous
> > > > >> and
> > > > >> > > > > > consistent.
> > > > >> > > > > > > Q4. I don't foresee the quota mechanism will interfere
> > > with
> > > > >> the
> > > > >> > > state
> > > > >> > > > > > > transition in any way so far, let me know if any
> concern
> > > > hits
> > > > >> > you.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > > Lijun
> > > > >> > > > > > >
> > > > >> > > > > > > Kamal Chandraprakash <[email protected]>
> > > > >> > > 于2026年1月18日周日
> > > > >> > > > > > > 00:40写道:
> > > > >> > > > > > >
> > > > >> > > > > > >> Hi Lijun,
> > > > >> > > > > > >>
> > > > >> > > > > > >> Thanks for updating the KIP!
> > > > >> > > > > > >>
> > > > >> > > > > > >> The updated migration plan looks clean to me. Few
> > > > questions:
> > > > >> > > > > > >>
> > > > >> > > > > > >> 1. The ConsumerTask in 4.2 Kafka build does not
> handle
> > > the
> > > > >> > > tombstone
> > > > >> > > > > > >> records. Should the tombstone records be sent only
> when
> > > all
> > > > >> the
> > > > >> > > > > brokers
> > > > >> > > > > > >> are
> > > > >> > > > > > >> upgraded to 4.3 version?
> > > > >> > > > > > >>
> > > > >> > > > > > >> 2. Once all the brokers are upgraded and the
> > > > >> > __remote_log_metadata
> > > > >> > > > > topic
> > > > >> > > > > > >> cleanup policy changed to compact. Then, downgrading
> > the
> > > > >> brokers
> > > > >> > > is
> > > > >> > > > > not
> > > > >> > > > > > >> allowed as the records without key will throw an
> error
> > > > while
> > > > >> > > > producing
> > > > >> > > > > > the
> > > > >> > > > > > >> compacted topic. Shall we mention this in the
> > > compatibility
> > > > >> > > section?
> > > > >> > > > > > >>
> > > > >> > > > > > >> 3. In the RemoteLogSegmentState Enum, why is the
> value
> > 1
> > > > >> marked
> > > > >> > as
> > > > >> > > > > > unused?
> > > > >> > > > > > >>
> > > > >> > > > > > >> 4. Regarding the key
> > > > >> > > (TopicIdPartition:EndOffset:BrokerLeaderEpoch),
> > > > >> > > > > we
> > > > >> > > > > > >> may
> > > > >> > > > > > >> have to check for scenarios where there is segment
> lag
> > > due
> > > > to
> > > > >> > > remote
> > > > >> > > > > log
> > > > >> > > > > > >> write quota. Will the state transition work
> correctly?
> > > Will
> > > > >> come
> > > > >> > > > back
> > > > >> > > > > to
> > > > >> > > > > > >> this later.
> > > > >> > > > > > >>
> > > > >> > > > > > >> Thanks,
> > > > >> > > > > > >> Kamal
> > > > >> > > > > > >>
> > > > >> > > > > > >> On Fri, Jan 16, 2026 at 4:50 AM jian fu <
> > > > >> [email protected]>
> > > > >> > > > wrote:
> > > > >> > > > > > >>
> > > > >> > > > > > >> > Hi Lijun and Kamal
> > > > >> > > > > > >> > I also think we don't need to keep delJIanpolicy in
> > > final
> > > > >> > > > config,if
> > > > >> > > > > > >> so,we
> > > > >> > > > > > >> > should always keep remembering all of our topic
> > > retention
> > > > >> time
> > > > >> > > > must
> > > > >> > > > > > less
> > > > >> > > > > > >> > than the value,right?It is one protect with risk
> > > > involved.
> > > > >> > > > > > >> > Regards
> > > > >> > > > > > >> > JIan
> > > > >> > > > > > >> >
> > > > >> > > > > > >> >
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > Lijun Tong <[email protected]>于2026年1月16日
> > > > 周五06:45写道:
> > > > >> > > > > > >> >
> > > > >> > > > > > >> > > Hey Kamal,
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> > > Some additional points about the Q4,
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> > > > The user can decide when to change their
> internal
> > > > topic
> > > > >> > > > cleanup
> > > > >> > > > > > >> policy
> > > > >> > > > > > >> > to
> > > > >> > > > > > >> > > > compact. If someone retains
> > > > >> > > > > > >> > > > the data in the remote storage for 3 months,
> then
> > > > they
> > > > >> can
> > > > >> > > > > migrate
> > > > >> > > > > > >> to
> > > > >> > > > > > >> > the
> > > > >> > > > > > >> > > > compacted topic after 3 months
> > > > >> > > > > > >> > > > post rolling out this change. And, update their
> > > > cleanup
> > > > >> > > policy
> > > > >> > > > > to
> > > > >> > > > > > >> > > [compact,
> > > > >> > > > > > >> > > > delete].
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> > > I don't think it's a good idea to keep delete in
> > the
> > > > >> final
> > > > >> > > > cleanup
> > > > >> > > > > > >> policy
> > > > >> > > > > > >> > > for the topic `__remote_log_metadata`, as this
> > still
> > > > >> > requires
> > > > >> > > > the
> > > > >> > > > > > >> user to
> > > > >> > > > > > >> > > keep track of the max retention hours of topics
> > that
> > > > have
> > > > >> > > remote
> > > > >> > > > > > >> storage
> > > > >> > > > > > >> > > enabled, and it's operational burden. It's also
> > hard
> > > to
> > > > >> > reason
> > > > >> > > > > about
> > > > >> > > > > > >> what
> > > > >> > > > > > >> > > will happen if the user configures the wrong
> > > > >> retention.ms.
> > > > >> > I
> > > > >> > > > hope
> > > > >> > > > > > >> this
> > > > >> > > > > > >> > > makes sense.
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> > > Thanks,
> > > > >> > > > > > >> > > Lijun Tong
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> > > Lijun Tong <[email protected]>
> 于2026年1月15日周四
> > > > >> 11:43写道:
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> > > > Hey Kamal,
> > > > >> > > > > > >> > > >
> > > > >> > > > > > >> > > > Thanks for your reply! I am glad we are on the
> > same
> > > > >> page
> > > > >> > > with
> > > > >> > > > > > making
> > > > >> > > > > > >> > the
> > > > >> > > > > > >> > > > __remote_log_metadata topic compacted optional
> > for
> > > > the
> > > > >> > user
> > > > >> > > > > now, I
> > > > >> > > > > > >> will
> > > > >> > > > > > >> > > > update the KIP with this change.
> > > > >> > > > > > >> > > >
> > > > >> > > > > > >> > > > For the Q2:
> > > > >> > > > > > >> > > > With the key designed as
> > > > >> > > > > > >> TopicId:Partition:EndOffset:BrokerLeaderEpoch,
> > > > >> > > > > > >> > > > even the same broker retries the upload
> multiple
> > > > times
> > > > >> for
> > > > >> > > the
> > > > >> > > > > > same
> > > > >> > > > > > >> log
> > > > >> > > > > > >> > > > segment, the latest retry attempt with the
> latest
> > > > >> segment
> > > > >> > > UUID
> > > > >> > > > > > will
> > > > >> > > > > > >> > > > overwrite the previous attempts' value since
> they
> > > > share
> > > > >> > the
> > > > >> > > > same
> > > > >> > > > > > >> key,
> > > > >> > > > > > >> > so
> > > > >> > > > > > >> > > we
> > > > >> > > > > > >> > > > don't need to explicitly track the failed
> upload
> > > > >> metadata,
> > > > >> > > > > because
> > > > >> > > > > > >> it's
> > > > >> > > > > > >> > > > gone already by the later attempt. That's my
> > > > >> understanding
> > > > >> > > > about
> > > > >> > > > > > the
> > > > >> > > > > > >> > > > RLMCopyTask, correct me if I am wrong.
> > > > >> > > > > > >> > > >
> > > > >> > > > > > >> > > > Thanks,
> > > > >> > > > > > >> > > > Lijun Tong
> > > > >> > > > > > >> > > >
> > > > >> > > > > > >> > > > Kamal Chandraprakash <
> > > [email protected]
> > > > >
> > > > >> > > > > > 于2026年1月14日周三
> > > > >> > > > > > >> > > > 21:18写道:
> > > > >> > > > > > >> > > >
> > > > >> > > > > > >> > > >> Hi Lijun,
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > > >> Thanks for the reply!
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > > >> Q1: Sounds good. Could you clarify it in the
> KIP
> > > > that
> > > > >> the
> > > > >> > > > same
> > > > >> > > > > > >> > > partitioner
> > > > >> > > > > > >> > > >> will be used?
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > > >> Q2: With
> > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch
> > > > >> > key,
> > > > >> > > > if
> > > > >> > > > > > the
> > > > >> > > > > > >> > same
> > > > >> > > > > > >> > > >> broker retries the upload due to intermittent
> > > > >> > > > > > >> > > >> issues in object storage (or) RLMM. Then,
> those
> > > > failed
> > > > >> > > upload
> > > > >> > > > > > >> metadata
> > > > >> > > > > > >> > > >> also
> > > > >> > > > > > >> > > >> need to be cleared.
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > > >> Q3: We may have to skip the null value records
> > in
> > > > the
> > > > >> > > > > > ConsumerTask.
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > > >> Q4a: The idea is to keep the cleanup policy as
> > > > >> "delete"
> > > > >> > and
> > > > >> > > > > also
> > > > >> > > > > > >> send
> > > > >> > > > > > >> > > the
> > > > >> > > > > > >> > > >> tombstone markers
> > > > >> > > > > > >> > > >> to the existing `__remote_log_metadata` topic.
> > > And,
> > > > >> > handle
> > > > >> > > > the
> > > > >> > > > > > >> > tombstone
> > > > >> > > > > > >> > > >> records in the ConsumerTask.
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > > >> The user can decide when to change their
> > internal
> > > > >> topic
> > > > >> > > > cleanup
> > > > >> > > > > > >> policy
> > > > >> > > > > > >> > > to
> > > > >> > > > > > >> > > >> compact. If someone retains
> > > > >> > > > > > >> > > >> the data in the remote storage for 3 months,
> > then
> > > > they
> > > > >> > can
> > > > >> > > > > > migrate
> > > > >> > > > > > >> to
> > > > >> > > > > > >> > > the
> > > > >> > > > > > >> > > >> compacted topic after 3 months
> > > > >> > > > > > >> > > >> post rolling out this change. And, update
> their
> > > > >> cleanup
> > > > >> > > > policy
> > > > >> > > > > to
> > > > >> > > > > > >> > > >> [compact,
> > > > >> > > > > > >> > > >> delete].
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > > >> Thanks,
> > > > >> > > > > > >> > > >> Kamal
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > > >> On Thu, Jan 15, 2026 at 4:12 AM Lijun Tong <
> > > > >> > > > > > >> [email protected]>
> > > > >> > > > > > >> > > >> wrote:
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > > >> > Hey Jian,
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >> > Thanks for your time to review this KIP. I
> > > > >> appreciate
> > > > >> > > that
> > > > >> > > > > you
> > > > >> > > > > > >> > > propose a
> > > > >> > > > > > >> > > >> > simpler migration solution to onboard the
> new
> > > > >> feature.
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >> > There are 2 points that I think can be
> further
> > > > >> refined
> > > > >> > > on:
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >> > 1). make the topic compacted optional,
> > although
> > > > the
> > > > >> new
> > > > >> > > > > feature
> > > > >> > > > > > >> will
> > > > >> > > > > > >> > > >> > continue to emit tombstone message for those
> > > > expired
> > > > >> > log
> > > > >> > > > > > segments
> > > > >> > > > > > >> > even
> > > > >> > > > > > >> > > >> when
> > > > >> > > > > > >> > > >> > the topic is still on time-based retention
> > mode,
> > > > so
> > > > >> > once
> > > > >> > > > user
> > > > >> > > > > > >> > switched
> > > > >> > > > > > >> > > >> to
> > > > >> > > > > > >> > > >> > using the compacted topic, those expired
> > > messages
> > > > >> can
> > > > >> > > still
> > > > >> > > > > be
> > > > >> > > > > > >> > deleted
> > > > >> > > > > > >> > > >> > despite the topic is not retention based
> > > anymore.
> > > > >> > > > > > >> > > >> > 2). we need to expose some flag to the user
> to
> > > > >> indicate
> > > > >> > > > > whether
> > > > >> > > > > > >> the
> > > > >> > > > > > >> > > >> topic
> > > > >> > > > > > >> > > >> > can be flipped to compacted by checking
> > whether
> > > > all
> > > > >> the
> > > > >> > > old
> > > > >> > > > > > >> format
> > > > >> > > > > > >> > > >> > keyed-less message has expired, and allow
> user
> > > to
> > > > >> > choose
> > > > >> > > to
> > > > >> > > > > > flip
> > > > >> > > > > > >> to
> > > > >> > > > > > >> > > >> > compacted only when the flag is true.
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >> > Thanks for sharing your idea. I will update
> > the
> > > > KIP
> > > > >> > later
> > > > >> > > > > with
> > > > >> > > > > > >> this
> > > > >> > > > > > >> > > new
> > > > >> > > > > > >> > > >> > idea.
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >> > Best,
> > > > >> > > > > > >> > > >> > Lijun Tong
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >> > jian fu <[email protected]>
> 于2026年1月12日周一
> > > > >> 04:55写道:
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >> > > Hi  Lijun Tong:
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> > > Thanks for your KIP which raise this
> > critical
> > > > >> issue.
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> > > what about just keep one topic instead of
> > > > involve
> > > > >> > > another
> > > > >> > > > > > >> topic.
> > > > >> > > > > > >> > > >> > > for existed topic data's migration. maybe
> we
> > > can
> > > > >> use
> > > > >> > > this
> > > > >> > > > > way
> > > > >> > > > > > >> to
> > > > >> > > > > > >> > > solve
> > > > >> > > > > > >> > > >> > the
> > > > >> > > > > > >> > > >> > > issue:
> > > > >> > > > > > >> > > >> > > (1) set the retention date > all of topic
> > > which
> > > > >> > enable
> > > > >> > > > > remote
> > > > >> > > > > > >> > > >> storage's
> > > > >> > > > > > >> > > >> > > retention time
> > > > >> > > > > > >> > > >> > > (2) deploy new kafka version with feature:
> > > > which
> > > > >> > send
> > > > >> > > > the
> > > > >> > > > > > >> message
> > > > >> > > > > > >> > > >> with
> > > > >> > > > > > >> > > >> > key
> > > > >> > > > > > >> > > >> > > (3) wait all the message expired and new
> > > message
> > > > >> with
> > > > >> > > key
> > > > >> > > > > > >> coming
> > > > >> > > > > > >> > to
> > > > >> > > > > > >> > > >> the
> > > > >> > > > > > >> > > >> > > topic
> > > > >> > > > > > >> > > >> > > (4) convert the topic to compact
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> > > I don't test it. Just propose this
> solution
> > > > >> according
> > > > >> > > to
> > > > >> > > > > code
> > > > >> > > > > > >> > review
> > > > >> > > > > > >> > > >> > > result.  just for your reference.
> > > > >> > > > > > >> > > >> > > The steps maybe a little complex. but it
> can
> > > > >> avoiding
> > > > >> > > add
> > > > >> > > > > new
> > > > >> > > > > > >> > topic.
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> > > Regards
> > > > >> > > > > > >> > > >> > > Jian
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> > > Lijun Tong <[email protected]>
> > > > 于2026年1月8日周四
> > > > >> > > > 09:17写道:
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> > > > Hey Kamal,
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > Thanks for your time for the review.
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > Here is my response to your questions:
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > Q1 At this point, I don’t see a need to
> > > change
> > > > >> > > > > > >> > > >> > > > RemoteLogMetadataTopicPartitioner for
> this
> > > > >> design.
> > > > >> > > > > Nothing
> > > > >> > > > > > in
> > > > >> > > > > > >> > the
> > > > >> > > > > > >> > > >> > current
> > > > >> > > > > > >> > > >> > > > approach appears to require a
> partitioner
> > > > >> change,
> > > > >> > but
> > > > >> > > > I’m
> > > > >> > > > > > >> open
> > > > >> > > > > > >> > to
> > > > >> > > > > > >> > > >> > > > revisiting if a concrete need arises.
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > Q2 I have some reservations about using
> > > > >> > > SegmentId:State
> > > > >> > > > > as
> > > > >> > > > > > >> the
> > > > >> > > > > > >> > > key.
> > > > >> > > > > > >> > > >> A
> > > > >> > > > > > >> > > >> > > > practical challenge we see today is that
> > the
> > > > >> same
> > > > >> > > > logical
> > > > >> > > > > > >> > segment
> > > > >> > > > > > >> > > >> can
> > > > >> > > > > > >> > > >> > be
> > > > >> > > > > > >> > > >> > > > retried multiple times with different
> > > > SegmentIds
> > > > >> > > across
> > > > >> > > > > > >> brokers.
> > > > >> > > > > > >> > > If
> > > > >> > > > > > >> > > >> the
> > > > >> > > > > > >> > > >> > > key
> > > > >> > > > > > >> > > >> > > > is SegmentId-based, it becomes harder to
> > > > >> discover
> > > > >> > and
> > > > >> > > > > > >> tombstone
> > > > >> > > > > > >> > > all
> > > > >> > > > > > >> > > >> > > related
> > > > >> > > > > > >> > > >> > > > attempts when the segment eventually
> > > expires.
> > > > >> The
> > > > >> > > > > > >> > > >> > > >
> > > TopicId:Partition:EndOffset:BrokerLeaderEpoch
> > > > >> key
> > > > >> > is
> > > > >> > > > > > >> > deterministic
> > > > >> > > > > > >> > > >> for
> > > > >> > > > > > >> > > >> > a
> > > > >> > > > > > >> > > >> > > > logical segment attempt and helps group
> > > > retries
> > > > >> by
> > > > >> > > > epoch,
> > > > >> > > > > > >> which
> > > > >> > > > > > >> > > >> > > simplifies
> > > > >> > > > > > >> > > >> > > > cleanup and reasoning about state. I’d
> > love
> > > to
> > > > >> > > > understand
> > > > >> > > > > > the
> > > > >> > > > > > >> > > >> benefits
> > > > >> > > > > > >> > > >> > > > you’re seeing with SegmentId:State
> > compared
> > > to
> > > > >> the
> > > > >> > > > > > >> > > >> offset/epoch-based
> > > > >> > > > > > >> > > >> > key
> > > > >> > > > > > >> > > >> > > > so we can weigh the trade-offs.
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > On partitioning: with this proposal, all
> > > > states
> > > > >> > for a
> > > > >> > > > > given
> > > > >> > > > > > >> user
> > > > >> > > > > > >> > > >> > > > topic-partition still map to the same
> > > metadata
> > > > >> > > > partition.
> > > > >> > > > > > >> That
> > > > >> > > > > > >> > > >> remains
> > > > >> > > > > > >> > > >> > > true
> > > > >> > > > > > >> > > >> > > > for the existing __remote_log_metadata
> > > > >> (unchanged
> > > > >> > > > > > >> partitioner)
> > > > >> > > > > > >> > and
> > > > >> > > > > > >> > > >> for
> > > > >> > > > > > >> > > >> > > the
> > > > >> > > > > > >> > > >> > > > new __remote_log_metadata_compacted,
> > > > preserving
> > > > >> the
> > > > >> > > > > > >> properties
> > > > >> > > > > > >> > > >> > > > RemoteMetadataCache relies on.
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > Q3 It should be fine for ConsumerTask to
> > > > ignore
> > > > >> > > > tombstone
> > > > >> > > > > > >> > records
> > > > >> > > > > > >> > > >> (null
> > > > >> > > > > > >> > > >> > > > values) and no-op.
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > Q4 Although TBRLMM is a sample RLMM
> > > > >> implementation,
> > > > >> > > > it’s
> > > > >> > > > > > >> > currently
> > > > >> > > > > > >> > > >> the
> > > > >> > > > > > >> > > >> > > only
> > > > >> > > > > > >> > > >> > > > OSS option and is widely used. The new
> > > > >> > > > > > >> > > >> __remote_log_metadata_compacted
> > > > >> > > > > > >> > > >> > > > topic offers clear operational benefits
> in
> > > > that
> > > > >> > > > context.
> > > > >> > > > > We
> > > > >> > > > > > >> can
> > > > >> > > > > > >> > > also
> > > > >> > > > > > >> > > >> > > > provide a configuration to let users
> > choose
> > > > >> whether
> > > > >> > > > they
> > > > >> > > > > > >> want to
> > > > >> > > > > > >> > > >> keep
> > > > >> > > > > > >> > > >> > the
> > > > >> > > > > > >> > > >> > > > audit topic (__remote_log_metadata) in
> > their
> > > > >> > cluster.
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > Q4a Enabling compaction on
> > > > __remote_log_metadata
> > > > >> > > alone
> > > > >> > > > > may
> > > > >> > > > > > >> not
> > > > >> > > > > > >> > > fully
> > > > >> > > > > > >> > > >> > > > address the unbounded growth, since we
> > also
> > > > >> need to
> > > > >> > > > emit
> > > > >> > > > > > >> > > tombstones
> > > > >> > > > > > >> > > >> for
> > > > >> > > > > > >> > > >> > > > expired keys to delete them. Deferring
> > > > >> compaction
> > > > >> > and
> > > > >> > > > > > >> > tombstoning
> > > > >> > > > > > >> > > to
> > > > >> > > > > > >> > > >> > user
> > > > >> > > > > > >> > > >> > > > configuration could make the code flow
> > > > >> complicated,
> > > > >> > > > also
> > > > >> > > > > > add
> > > > >> > > > > > >> > > >> > operational
> > > > >> > > > > > >> > > >> > > > complexity and make outcomes less
> > > predictable.
> > > > >> The
> > > > >> > > > > proposal
> > > > >> > > > > > >> aims
> > > > >> > > > > > >> > > to
> > > > >> > > > > > >> > > >> > > provide
> > > > >> > > > > > >> > > >> > > > a consistent experience by defining
> > > > >> deterministic
> > > > >> > > keys
> > > > >> > > > > and
> > > > >> > > > > > >> > > emitting
> > > > >> > > > > > >> > > >> > > > tombstones as part of the broker’s
> > > > >> > responsibilities,
> > > > >> > > > > while
> > > > >> > > > > > >> still
> > > > >> > > > > > >> > > >> > allowing
> > > > >> > > > > > >> > > >> > > > users to opt out of the audit topic if
> > they
> > > > >> prefer.
> > > > >> > > > But I
> > > > >> > > > > > am
> > > > >> > > > > > >> > open
> > > > >> > > > > > >> > > to
> > > > >> > > > > > >> > > >> > more
> > > > >> > > > > > >> > > >> > > > discussion if there is any concrete
> need I
> > > > don't
> > > > >> > > > foresee.
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > Thanks,
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > Lijun Tong
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > Kamal Chandraprakash <
> > > > >> > [email protected]
> > > > >> > > >
> > > > >> > > > > > >> > > 于2026年1月6日周二
> > > > >> > > > > > >> > > >> > > > 01:01写道:
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > > > > Hi Lijun,
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > > > Thanks for the KIP! Went over the
> first
> > > > pass.
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > > > Few Questions:
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > > > 1. Are we going to maintain the same
> > > > >> > > > > > >> > > >> > RemoteLogMetadataTopicPartitioner
> > > > >> > > > > > >> > > >> > > > > <
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> >
> > > > >> > > > > > >>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataTopicPartitioner.java
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > for both the topics? It is not clear
> in
> > > the
> > > > >> KIP,
> > > > >> > > > could
> > > > >> > > > > > you
> > > > >> > > > > > >> > > clarify
> > > > >> > > > > > >> > > >> > it?
> > > > >> > > > > > >> > > >> > > > > 2. Can the key be changed to
> > > SegmentId:State
> > > > >> > > instead
> > > > >> > > > of
> > > > >> > > > > > >> > > >> > > > >
> > > > TopicId:Partition:EndOffset:BrokerLeaderEpoch
> > > > >> if
> > > > >> > > the
> > > > >> > > > > same
> > > > >> > > > > > >> > > >> partitioner
> > > > >> > > > > > >> > > >> > > is
> > > > >> > > > > > >> > > >> > > > > used? It is good to maintain all the
> > > segment
> > > > >> > states
> > > > >> > > > > for a
> > > > >> > > > > > >> > > >> > > > > user-topic-partition in the same
> > metadata
> > > > >> > > partition.
> > > > >> > > > > > >> > > >> > > > > 3. Should we have to handle the
> records
> > > with
> > > > >> null
> > > > >> > > > value
> > > > >> > > > > > >> > > >> (tombstone)
> > > > >> > > > > > >> > > >> > in
> > > > >> > > > > > >> > > >> > > > the
> > > > >> > > > > > >> > > >> > > > > ConsumerTask
> > > > >> > > > > > >> > > >> > > > > <
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> >
> > > > >> > > > > > >>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/ConsumerTask.java?L166
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > ?
> > > > >> > > > > > >> > > >> > > > > 4. TBRLMM
> > > > >> > > > > > >> > > >> > > > > <
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> >
> > > > >> > > > > > >>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > is a sample plugin implementation of
> > RLMM.
> > > > Not
> > > > >> > sure
> > > > >> > > > > > whether
> > > > >> > > > > > >> > the
> > > > >> > > > > > >> > > >> > > community
> > > > >> > > > > > >> > > >> > > > > will agree to add one more internal
> > topic
> > > > for
> > > > >> > this
> > > > >> > > > > plugin
> > > > >> > > > > > >> > impl.
> > > > >> > > > > > >> > > >> > > > > 4a. Can we modify the new messages to
> > the
> > > > >> > > > > > >> > __remote_log_metadata
> > > > >> > > > > > >> > > >> topic
> > > > >> > > > > > >> > > >> > > to
> > > > >> > > > > > >> > > >> > > > > contain the key and leave it to the
> user
> > > to
> > > > >> > enable
> > > > >> > > > > > >> compaction
> > > > >> > > > > > >> > > for
> > > > >> > > > > > >> > > >> > this
> > > > >> > > > > > >> > > >> > > > > topic if they need?
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > > > Thanks,
> > > > >> > > > > > >> > > >> > > > > Kamal
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > > > On Tue, Jan 6, 2026 at 7:35 AM Lijun
> > Tong
> > > <
> > > > >> > > > > > >> > > >> [email protected]>
> > > > >> > > > > > >> > > >> > > > wrote:
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > > > > Hey Henry,
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > > Thank you for your time and
> response!
> > I
> > > > >> really
> > > > >> > > like
> > > > >> > > > > > your
> > > > >> > > > > > >> > > >> KIP-1248
> > > > >> > > > > > >> > > >> > > about
> > > > >> > > > > > >> > > >> > > > > > offloading the consumption of remote
> > log
> > > > >> away
> > > > >> > > from
> > > > >> > > > > the
> > > > >> > > > > > >> > broker,
> > > > >> > > > > > >> > > >> and
> > > > >> > > > > > >> > > >> > I
> > > > >> > > > > > >> > > >> > > > > think
> > > > >> > > > > > >> > > >> > > > > > with that change, the topic that
> > enables
> > > > the
> > > > >> > > tiered
> > > > >> > > > > > >> storage
> > > > >> > > > > > >> > > can
> > > > >> > > > > > >> > > >> > also
> > > > >> > > > > > >> > > >> > > > have
> > > > >> > > > > > >> > > >> > > > > > longer retention configurations and
> > > would
> > > > >> > benefit
> > > > >> > > > > from
> > > > >> > > > > > >> this
> > > > >> > > > > > >> > > KIP
> > > > >> > > > > > >> > > >> > too.
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > > Some suggestions: In your example
> > > > >> scenarios, it
> > > > >> > > > would
> > > > >> > > > > > >> also
> > > > >> > > > > > >> > be
> > > > >> > > > > > >> > > >> good
> > > > >> > > > > > >> > > >> > to
> > > > >> > > > > > >> > > >> > > > add
> > > > >> > > > > > >> > > >> > > > > > > an example of remote log segment
> > > > deletion
> > > > >> > > > triggered
> > > > >> > > > > > by
> > > > >> > > > > > >> > > >> retention
> > > > >> > > > > > >> > > >> > > > policy
> > > > >> > > > > > >> > > >> > > > > > > which will trigger generation of
> > > > tombstone
> > > > >> > > event
> > > > >> > > > > into
> > > > >> > > > > > >> > > metadata
> > > > >> > > > > > >> > > >> > > topic
> > > > >> > > > > > >> > > >> > > > > and
> > > > >> > > > > > >> > > >> > > > > > > trigger log compaction/deletion 24
> > > hour
> > > > >> > later,
> > > > >> > > I
> > > > >> > > > > > think
> > > > >> > > > > > >> > this
> > > > >> > > > > > >> > > is
> > > > >> > > > > > >> > > >> > the
> > > > >> > > > > > >> > > >> > > > key
> > > > >> > > > > > >> > > >> > > > > > > event to cap the metadata topic
> > size.
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > > Regarding to this suggestion, I am
> not
> > > > sure
> > > > >> > > whether
> > > > >> > > > > > >> > Scenario 4
> > > > >> > > > > > >> > > >> > > > > > <
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> >
> > > > >> > > > > > >>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406618613#KIP1266:BoundingTheNumberOfRemoteLogMetadataMessagesviaCompactedTopic-Scenario4:SegmentDeletion
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > has
> > > > >> > > > > > >> > > >> > > > > > covered it. I can add more rows in
> the
> > > > >> Timeline
> > > > >> > > > Table
> > > > >> > > > > > >> like
> > > > >> > > > > > >> > > >> > T5+24hour
> > > > >> > > > > > >> > > >> > > to
> > > > >> > > > > > >> > > >> > > > > > indicate the messages are gone by
> then
> > > to
> > > > >> > > > explicitly
> > > > >> > > > > > show
> > > > >> > > > > > >> > that
> > > > >> > > > > > >> > > >> > > messages
> > > > >> > > > > > >> > > >> > > > > are
> > > > >> > > > > > >> > > >> > > > > > deleted, thus the number of messages
> > are
> > > > >> capped
> > > > >> > > in
> > > > >> > > > > the
> > > > >> > > > > > >> > topic.
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > > Regarding whether the topic
> > > > >> > __remote_log_metadata
> > > > >> > > > is
> > > > >> > > > > > >> still
> > > > >> > > > > > >> > > >> > > necessary, I
> > > > >> > > > > > >> > > >> > > > > am
> > > > >> > > > > > >> > > >> > > > > > inclined to continue to have this
> > topic
> > > at
> > > > >> > least
> > > > >> > > > for
> > > > >> > > > > > >> > debugging
> > > > >> > > > > > >> > > >> > > purposes
> > > > >> > > > > > >> > > >> > > > > so
> > > > >> > > > > > >> > > >> > > > > > we can build confidence about the
> > > > compacted
> > > > >> > topic
> > > > >> > > > > > >> change, we
> > > > >> > > > > > >> > > can
> > > > >> > > > > > >> > > >> > > > > > always choose to remove this topic
> in
> > > the
> > > > >> > future
> > > > >> > > > once
> > > > >> > > > > > we
> > > > >> > > > > > >> all
> > > > >> > > > > > >> > > >> agree
> > > > >> > > > > > >> > > >> > it
> > > > >> > > > > > >> > > >> > > > > > provides limited value for the
> users.
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > > Thanks,
> > > > >> > > > > > >> > > >> > > > > > Lijun Tong
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > > Henry Haiying Cai via dev <
> > > > >> > [email protected]>
> > > > >> > > > > > >> > 于2026年1月5日周一
> > > > >> > > > > > >> > > >> > > 16:19写道:
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > > > > Lijun,
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > Thanks for the proposal and I
> liked
> > > your
> > > > >> idea
> > > > >> > > of
> > > > >> > > > > > using
> > > > >> > > > > > >> a
> > > > >> > > > > > >> > > >> > compacted
> > > > >> > > > > > >> > > >> > > > > topic
> > > > >> > > > > > >> > > >> > > > > > > for tiered storage metadata topic.
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > In our setup, we have set a
> shorter
> > > > >> retention
> > > > >> > > (3
> > > > >> > > > > > days)
> > > > >> > > > > > >> for
> > > > >> > > > > > >> > > the
> > > > >> > > > > > >> > > >> > > tiered
> > > > >> > > > > > >> > > >> > > > > > > storage metadata topic to control
> > the
> > > > size
> > > > >> > > > growth.
> > > > >> > > > > > We
> > > > >> > > > > > >> can
> > > > >> > > > > > >> > > do
> > > > >> > > > > > >> > > >> > that
> > > > >> > > > > > >> > > >> > > > > since
> > > > >> > > > > > >> > > >> > > > > > we
> > > > >> > > > > > >> > > >> > > > > > > control all topic's retention
> policy
> > > in
> > > > >> our
> > > > >> > > > > clusters
> > > > >> > > > > > >> and
> > > > >> > > > > > >> > we
> > > > >> > > > > > >> > > >> set a
> > > > >> > > > > > >> > > >> > > > > uniform
> > > > >> > > > > > >> > > >> > > > > > > retention.policy for all our
> tiered
> > > > >> storage
> > > > >> > > > topics.
> > > > >> > > > > > I
> > > > >> > > > > > >> can
> > > > >> > > > > > >> > > see
> > > > >> > > > > > >> > > >> > > other
> > > > >> > > > > > >> > > >> > > > > > > users/companies will not be able
> to
> > > > >> enforce
> > > > >> > > that
> > > > >> > > > > > >> retention
> > > > >> > > > > > >> > > >> policy
> > > > >> > > > > > >> > > >> > > to
> > > > >> > > > > > >> > > >> > > > > all
> > > > >> > > > > > >> > > >> > > > > > > tiered storage topics.
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > Some suggestions: In your example
> > > > >> scenarios,
> > > > >> > it
> > > > >> > > > > would
> > > > >> > > > > > >> also
> > > > >> > > > > > >> > > be
> > > > >> > > > > > >> > > >> > good
> > > > >> > > > > > >> > > >> > > to
> > > > >> > > > > > >> > > >> > > > > add
> > > > >> > > > > > >> > > >> > > > > > > an example of remote log segment
> > > > deletion
> > > > >> > > > triggered
> > > > >> > > > > > by
> > > > >> > > > > > >> > > >> retention
> > > > >> > > > > > >> > > >> > > > policy
> > > > >> > > > > > >> > > >> > > > > > > which will trigger generation of
> > > > tombstone
> > > > >> > > event
> > > > >> > > > > into
> > > > >> > > > > > >> > > metadata
> > > > >> > > > > > >> > > >> > > topic
> > > > >> > > > > > >> > > >> > > > > and
> > > > >> > > > > > >> > > >> > > > > > > trigger log compaction/deletion 24
> > > hour
> > > > >> > later,
> > > > >> > > I
> > > > >> > > > > > think
> > > > >> > > > > > >> > this
> > > > >> > > > > > >> > > is
> > > > >> > > > > > >> > > >> > the
> > > > >> > > > > > >> > > >> > > > key
> > > > >> > > > > > >> > > >> > > > > > > event to cap the metadata topic
> > size.
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > For the original unbounded
> > > > >> > remote_log_metadata
> > > > >> > > > > topic,
> > > > >> > > > > > >> I am
> > > > >> > > > > > >> > > not
> > > > >> > > > > > >> > > >> > sure
> > > > >> > > > > > >> > > >> > > > > > > whether we still need it or not.
> If
> > > it
> > > > is
> > > > >> > left
> > > > >> > > > > only
> > > > >> > > > > > >> for
> > > > >> > > > > > >> > > audit
> > > > >> > > > > > >> > > >> > > trail
> > > > >> > > > > > >> > > >> > > > > > > purpose, people can set up a data
> > > > >> ingestion
> > > > >> > > > > pipeline
> > > > >> > > > > > to
> > > > >> > > > > > >> > > ingest
> > > > >> > > > > > >> > > >> > the
> > > > >> > > > > > >> > > >> > > > > > content
> > > > >> > > > > > >> > > >> > > > > > > of metadata topic into a separate
> > > > storage
> > > > >> > > > location.
> > > > >> > > > > > I
> > > > >> > > > > > >> > think
> > > > >> > > > > > >> > > >> we
> > > > >> > > > > > >> > > >> > can
> > > > >> > > > > > >> > > >> > > > > have
> > > > >> > > > > > >> > > >> > > > > > a
> > > > >> > > > > > >> > > >> > > > > > > flag to have only one metadata
> topic
> > > > (the
> > > > >> > > > compacted
> > > > >> > > > > > >> > > version).
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > On Monday, January 5, 2026 at
> > 01:22:42
> > > > PM
> > > > >> > PST,
> > > > >> > > > > Lijun
> > > > >> > > > > > >> Tong
> > > > >> > > > > > >> > <
> > > > >> > > > > > >> > > >> > > > > > > [email protected]> wrote:
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > Hello Kafka Community,
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > I would like to start a discussion
> > on
> > > > >> > KIP-1266,
> > > > >> > > > > which
> > > > >> > > > > > >> > > >> proposes to
> > > > >> > > > > > >> > > >> > > add
> > > > >> > > > > > >> > > >> > > > > > > another new compacted remote log
> > > > metadata
> > > > >> > topic
> > > > >> > > > for
> > > > >> > > > > > the
> > > > >> > > > > > >> > > tiered
> > > > >> > > > > > >> > > >> > > > storage,
> > > > >> > > > > > >> > > >> > > > > > to
> > > > >> > > > > > >> > > >> > > > > > > limit the number of messages that
> > need
> > > > to
> > > > >> be
> > > > >> > > > > iterated
> > > > >> > > > > > >> to
> > > > >> > > > > > >> > > build
> > > > >> > > > > > >> > > >> > the
> > > > >> > > > > > >> > > >> > > > > remote
> > > > >> > > > > > >> > > >> > > > > > > metadata state.
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > KIP link: KIP-1266 Bounding The
> > Number
> > > > Of
> > > > >> > > > > > >> > RemoteLogMetadata
> > > > >> > > > > > >> > > >> > > Messages
> > > > >> > > > > > >> > > >> > > > > via
> > > > >> > > > > > >> > > >> > > > > > > Compacted RemoteLogMetadata Topic
> > > > >> > > > > > >> > > >> > > > > > > <
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> >
> > > > >> > > > > > >>
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1266%3A+Bounding+The+Number+Of+RemoteLogMetadata+Messages+via+Compacted+Topic
> > > > >> > > > > > >> > > >> > > > > > > >
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > Background:
> > > > >> > > > > > >> > > >> > > > > > > The current Tiered Storage
> > > > implementation
> > > > >> > uses
> > > > >> > > a
> > > > >> > > > > > >> > > >> > > > __remote_log_metadata
> > > > >> > > > > > >> > > >> > > > > > > topic with infinite retention and
> > > > >> > delete-based
> > > > >> > > > > > cleanup
> > > > >> > > > > > >> > > policy,
> > > > >> > > > > > >> > > >> > > > causing
> > > > >> > > > > > >> > > >> > > > > > > unbounded growth, slow broker
> > > bootstrap,
> > > > >> no
> > > > >> > > > > mechanism
> > > > >> > > > > > >> to
> > > > >> > > > > > >> > > >> clean up
> > > > >> > > > > > >> > > >> > > > > expired
> > > > >> > > > > > >> > > >> > > > > > > segment metadata, and inefficient
> > > > >> re-reading
> > > > >> > > from
> > > > >> > > > > > >> offset 0
> > > > >> > > > > > >> > > >> during
> > > > >> > > > > > >> > > >> > > > > > > leadership changes.
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > Proposal:
> > > > >> > > > > > >> > > >> > > > > > > A dual-topic approach that
> > introduces
> > > a
> > > > >> new
> > > > >> > > > > > >> > > >> > > > > > __remote_log_metadata_compacted
> > > > >> > > > > > >> > > >> > > > > > > topic using log compaction with
> > > > >> deterministic
> > > > >> > > > > > >> offset-based
> > > > >> > > > > > >> > > >> keys,
> > > > >> > > > > > >> > > >> > > > while
> > > > >> > > > > > >> > > >> > > > > > > preserving the existing topic for
> > > audit
> > > > >> > > history;
> > > > >> > > > > this
> > > > >> > > > > > >> > allows
> > > > >> > > > > > >> > > >> > > brokers
> > > > >> > > > > > >> > > >> > > > to
> > > > >> > > > > > >> > > >> > > > > > > build their metadata cache
> > exclusively
> > > > >> from
> > > > >> > the
> > > > >> > > > > > >> compacted
> > > > >> > > > > > >> > > >> topic,
> > > > >> > > > > > >> > > >> > > > > enables
> > > > >> > > > > > >> > > >> > > > > > > cleanup of expired segment
> metadata
> > > > >> through
> > > > >> > > > > > tombstones,
> > > > >> > > > > > >> > and
> > > > >> > > > > > >> > > >> > > includes
> > > > >> > > > > > >> > > >> > > > a
> > > > >> > > > > > >> > > >> > > > > > > migration strategy to populate the
> > new
> > > > >> topic
> > > > >> > > > during
> > > > >> > > > > > >> > > >> > > > upgrade—delivering
> > > > >> > > > > > >> > > >> > > > > > > bounded metadata growth and faster
> > > > broker
> > > > >> > > startup
> > > > >> > > > > > while
> > > > >> > > > > > >> > > >> > maintaining
> > > > >> > > > > > >> > > >> > > > > > > backward compatibility.
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > More details are in the attached
> KIP
> > > > link.
> > > > >> > > > > > >> > > >> > > > > > > Looking forward to your thoughts.
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > Thank you for your time!
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > > > Best,
> > > > >> > > > > > >> > > >> > > > > > > Lijun Tong
> > > > >> > > > > > >> > > >> > > > > > >
> > > > >> > > > > > >> > > >> > > > > >
> > > > >> > > > > > >> > > >> > > > >
> > > > >> > > > > > >> > > >> > > >
> > > > >> > > > > > >> > > >> > >
> > > > >> > > > > > >> > > >> >
> > > > >> > > > > > >> > > >>
> > > > >> > > > > > >> > > >
> > > > >> > > > > > >> > >
> > > > >> > > > > > >> >
> > > > >> > > > > > >>
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to