Re: [DISCUSS] KIP-1058: Txn consumer exerts pressure on remote storage when reading non-txn topic

Divij Vaidya Mon, 04 Nov 2024 11:12:35 -0800

Sorry for the churn on this one. But circling back to the API.

Today, the API forces the plugins to have an optimization which can provide
the nextSegment with trx Idx directly. This goes against the
original invariant of KIP 405, where we wanted the plugins to be quite dumb
CRUD APIs.


An ideal API will allow a plugin to use the optimization where possible and
fall back to usual iteration through all segment metadata where
optimization is not available in the plugin.
Could we change the contract of the API which can leave this flexibility of
the optimization to the plugin developers? (The topic based RLMM would
continue to use the optimization you authored in the PR)

Thoughts?

--
Divij Vaidya



On Mon, Nov 4, 2024 at 3:01 PM Divij Vaidya <divijvaidy...@gmail.com> wrote:

> Thanks for patiently addressing all the comments.
>
> I will add the vote in the other thread.
>
> --
> Divij Vaidya
>
>
>
> On Mon, Nov 4, 2024 at 5:40 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
>> Hi all,
>>
>> If the review comments are addressed and KIP looks good, please submit
>> your
>> vote on the voting thread.
>>
>> Thanks,
>> Kamal
>>
>> On Sat, Nov 2, 2024 at 7:52 PM Kamal Chandraprakash <
>> kamal.chandraprak...@gmail.com> wrote:
>>
>> > I've opened a PR which adds an integration test
>> > <
>> https://github.com/apache/kafka/pull/17668/files#diff-a60b518846fc0f770164d55b6d7cd31e03a002514377578c80c6e22cc120af40R88
>> >
>> > to show the impact of this change/KIP. PTAL.
>> >
>> > --
>> > Kamal
>> >
>> > On Fri, Nov 1, 2024 at 3:31 PM Kamal Chandraprakash <
>> > kamal.chandraprak...@gmail.com> wrote:
>> >
>> >> Thanks for the review!
>> >>
>> >> > 5
>> >> Updated the KIP to include the `isEmpty` method in the transaction
>> index
>> >>
>> >> > 6
>> >> You're right. The offset parameter will be equal to the
>> >> next-segment-to-consider base offset.
>> >> But, the API introduced in RLMM is for *one* epoch. The next epoch
>> >> start-offset may not be
>> >> the base-offset. You can refer to this line
>> >> <
>> https://github.com/apache/kafka/pull/17659/files#diff-380e4d8859ea9148f21794c09039425c82d9012a392c2dbbe1ce2ec8677a1970R1857
>> >
>> >> in the draft PR.
>> >>
>> >>
>> >> On Fri, Nov 1, 2024 at 3:16 PM Divij Vaidya <divijvaidy...@gmail.com>
>> >> wrote:
>> >>
>> >>> Last few things -
>> >>>
>> >>> # 5
>> >>> About setting the TrxIndexEmpty field, could we introduce an isEmpty()
>> >>> function in TransactionIndex which has the following implementation:
>> >>>
>> >>> public boolean isEmpty() {
>> >>>     return !iterable().iterator().hasNext();
>> >>> }
>> >>>
>> >>>
>> >>> The advantages of this approach is:
>> >>> 1. It works for both cases when the file is not present and also when
>> the
>> >>> file is present but is empty.
>> >>> 2. It prevents leaking the underlying implementation of
>> TransactionIndex
>> >>> outside via the file() method. I think that making file() as public
>> is an
>> >>> implementation leak (for example, what is the trx indx is not file
>> >>> based!).
>> >>>
>> >>>
>> >>> #6
>> >>> In the documentation for nextSegmentWithTxnIndex, the offset parameter
>> >>> should be equal to the next-segment-to-consider's base offset, no?
>> >>> I assume that we will add a new fetch here with nextSegmentBaseOffset
>> >>>
>> >>>
>> https://github.com/apache/kafka/blob/346fdbafc539bc48bb66eedae89a15e240007fd9/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1801
>> >>> . Is there a case where the parameter "offset" will not be equal to
>> the
>> >>> baseOffset of a segment?
>> >>>
>> >>> --
>> >>> Divij Vaidya
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Nov 1, 2024 at 10:26 AM Kamal Chandraprakash <
>> >>> kamal.chandraprak...@gmail.com> wrote:
>> >>>
>> >>> > Hi Divij,
>> >>> >
>> >>> > Thanks for the detailed review!
>> >>> >
>> >>> > > 1, 2, 3, 4
>> >>> > Updated the KIP-1058
>> >>> > <
>> >>> >
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1058%3A+Txn+consumer+exerts+pressure+on+remote+storage+when+collecting+aborted+transactions
>> >>> > >
>> >>> > with the feedback received and also opened a draft PR for #17659
>> >>> > <https://github.com/apache/kafka/pull/17659/files> reference.
>> >>> > PTAL.
>> >>> >
>> >>> > > 5. How are we determining the value of the TrxIndexEmpty field on
>> >>> segment
>> >>> > rotation?
>> >>> > Transaction index file is optional, the file does not exists when
>> >>> there are
>> >>> > no aborted txn entries for a
>> >>> > segment, we will be using the file null check. Also, updated it in
>> the
>> >>> KIP.
>> >>> >
>> >>> > Thanks,
>> >>> > Kamal
>> >>> >
>> >>> >
>> >>> > On Tue, Oct 29, 2024 at 8:47 PM Divij Vaidya <
>> divijvaidy...@gmail.com>
>> >>> > wrote:
>> >>> >
>> >>> > > A few more points to discuss (please add to the KIP as well)
>> >>> > >
>> >>> > > 5. How are we determining the value of the TrxIndexEmpty field on
>> >>> segment
>> >>> > > rotation?
>> >>> > >
>> >>> > > One option is to do a boolean txnIdxEmpty =
>> >>> > > segment.txnIndex().allAbortedTxns().isEmpty() but this will have
>> an
>> >>> > > overhead of reading the contents of the file and storing them in
>> >>> memory,
>> >>> > > when we have a non-empty index.
>> >>> > > The other option (preferred) is to add an isEmpty() public method
>> to
>> >>> the
>> >>> > > TransactionIndex and perform a segment.txnIndex().isEmpty() check
>> >>> which
>> >>> > > will internally use Files.size() java API.
>> >>> > >
>> >>> > > On Tue, Oct 29, 2024 at 1:21 PM Divij Vaidya <
>> >>> divijvaidy...@gmail.com>
>> >>> > > wrote:
>> >>> > >
>> >>> > > > Let's get the ball rolling (again) on this one.
>> >>> > > >
>> >>> > > > Kamal, could you please add the following to the KIP:
>> >>> > > > 1. the API as discussed above. Please add the failure modes for
>> >>> this
>> >>> > API
>> >>> > > > as well such as the exceptions thrown and a recommendation on
>> how a
>> >>> > > caller
>> >>> > > > is expected to handle those. I am assuming that the three
>> >>> parameters
>> >>> > for
>> >>> > > > this API will be topicPartition, epoch and offset.
>> >>> > > > 2. implementation details for Topic based RLMM. I am assuming
>> that
>> >>> the
>> >>> > > > plugin will default the field to false if this field is absent
>> >>> (case of
>> >>> > > old
>> >>> > > > metadata).
>> >>> > > > 3. In the test plan section, additionally, we need to assert
>> that
>> >>> we
>> >>> > > don't
>> >>> > > > read metadata for all segments (i.e. it is not a linear search)
>> >>> from
>> >>> > the
>> >>> > > > Topic based RLMM.
>> >>> > > > 4. in the compatibility section, please document how the
>> existing
>> >>> > > clusters
>> >>> > > > with Tiered Storage metadata will work during/after a rolling
>> >>> upgrade
>> >>> > to
>> >>> > > a
>> >>> > > > version which contains this new change.
>> >>> > > >
>> >>> > > > --
>> >>> > > > Divij Vaidya
>> >>> > > >
>> >>> > > >
>> >>> > > >
>> >>> > > > On Fri, Oct 11, 2024 at 12:26 PM Kamal Chandraprakash <
>> >>> > > > kamal.chandraprak...@gmail.com> wrote:
>> >>> > > >
>> >>> > > >> Bump for review.
>> >>> > > >>
>> >>> > > >> If the additional proposal looks good, I'll append them to the
>> >>> KIP.
>> >>> > > PTAL.
>> >>> > > >>
>> >>> > > >> New API in RLMM#nextRemoteLogSegmentMetadataWithTxnIndex
>> >>> > > >>
>> >>> > > >> --
>> >>> > > >> Kamal
>> >>> > > >>
>> >>> > > >> On Sun, Oct 6, 2024 at 7:20 PM Kamal Chandraprakash <
>> >>> > > >> kamal.chandraprak...@gmail.com> wrote:
>> >>> > > >>
>> >>> > > >> > Hi Christo,
>> >>> > > >> >
>> >>> > > >> > Thanks for the review!
>> >>> > > >> >
>> >>> > > >> > Adding the new API
>> `nextRemoteLogSegmentMetadataWithTxnIndex` in
>> >>> > RLMM
>> >>> > > >> > helps to
>> >>> > > >> > reduce the complexity of linear search. With this API, we
>> have
>> >>> to:
>> >>> > > >> >
>> >>> > > >> > 1. Maintain one more skip-list [1] for each of the epochs in
>> the
>> >>> > > >> partition
>> >>> > > >> > in RLMM that might
>> >>> > > >> >     increase the memory usage of TopicBased RLMM
>> implementation.
>> >>> > > >> >     1a) The skip-list will be empty when there are no aborted
>> >>> txn
>> >>> > > >> entries
>> >>> > > >> > for a partition/epoch which is the predominant case.
>> >>> > > >> >     1b) The skip-list will act as a duplicate when *most* of
>> the
>> >>> > > >> segments
>> >>> > > >> > have aborted txn entries, assuming aborted txn are quite low,
>> >>> this
>> >>> > > >> should
>> >>> > > >> > be fine.
>> >>> > > >> > 2. Change the logic to retrieve the aborted txns (we have to
>> >>> query
>> >>> > the
>> >>> > > >> > nextRLSMWithTxnIndex
>> >>> > > >> >     for each of the leader-epoch).
>> >>> > > >> > 3. Logic divergence from how we retrieve the aborted txn
>> entries
>> >>> > > >> compared
>> >>> > > >> > to the local-log.
>> >>> > > >> >
>> >>> > > >> > The approach looks good to me. If everyone is aligned, then
>> we
>> >>> can
>> >>> > > >> proceed
>> >>> > > >> > to add this API to RLMM.
>> >>> > > >> >
>> >>> > > >> > Another option I was thinking of is to capture the
>> >>> > > `lastStableOffsetLag`
>> >>> > > >> > [2] while rotating the segment.
>> >>> > > >> > But, that is a bigger change we can take later.
>> >>> > > >> >
>> >>> > > >> > [1]:
>> >>> > > >> >
>> >>> > > >>
>> >>> > >
>> >>> >
>> >>>
>> https://sourcegraph.com/github.com/apache/kafka/-/blob/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogLeaderEpochState.java?L43
>> >>> > > >> > [2]:
>> >>> > > >> >
>> >>> > > >>
>> >>> > >
>> >>> >
>> >>>
>> https://sourcegraph.com/github.com/apache/kafka/-/blob/core/src/main/scala/kafka/log/UnifiedLog.scala?L432
>> >>> > > >> >
>> >>> > > >> >
>> >>> > > >> > Thanks,
>> >>> > > >> > Kamal
>> >>> > > >> >
>> >>> > > >> > On Fri, Oct 4, 2024 at 4:21 PM Christo Lolov <
>> >>> > christolo...@gmail.com>
>> >>> > > >> > wrote:
>> >>> > > >> >
>> >>> > > >> >> Heya,
>> >>> > > >> >>
>> >>> > > >> >> Apologies for the delay. I have been thinking about this
>> >>> problem
>> >>> > > >> recently
>> >>> > > >> >> as well and while I believe storing a boolean in the
>> metadata
>> >>> is
>> >>> > > good,
>> >>> > > >> I
>> >>> > > >> >> think we can do better by introducing a new method to the
>> RLMM
>> >>> > along
>> >>> > > >> the
>> >>> > > >> >> lines of
>> >>> > > >> >>
>> >>> > > >> >> Optional<RemoteLogSegmentMetadata>
>> >>> > > >> >> nextRemoteLogSegmentMetadataWithTxnIndex(TopicIdPartition
>> >>> > > >> >> topicIdPartition,
>> >>> > > >> >> int epochForOffset, long offset) throws
>> RemoteStorageException
>> >>> > > >> >>
>> >>> > > >> >> This will help plugin implementers to build optimisations
>> such
>> >>> as
>> >>> > > skip
>> >>> > > >> >> lists which will give them the next segment quicker than a
>> >>> linear
>> >>> > > >> search.
>> >>> > > >> >>
>> >>> > > >> >> I am keen to hear your thoughts!
>> >>> > > >> >>
>> >>> > > >> >> Best,
>> >>> > > >> >> Christo
>> >>> > > >> >>
>> >>> > > >> >> On Fri, 4 Oct 2024 at 10:48, Kamal Chandraprakash <
>> >>> > > >> >> kamal.chandraprak...@gmail.com> wrote:
>> >>> > > >> >>
>> >>> > > >> >> > Hi Luke,
>> >>> > > >> >> >
>> >>> > > >> >> > Thanks for the review!
>> >>> > > >> >> >
>> >>> > > >> >> > > Do you think it is helpful if we store the "least abort
>> >>> start
>> >>> > > >> offset
>> >>> > > >> >> in
>> >>> > > >> >> > the
>> >>> > > >> >> > segment", and -1 means no txnIndex. So that we can have a
>> >>> way to
>> >>> > > know
>> >>> > > >> >> if we
>> >>> > > >> >> > need to fetch this txn index or not.
>> >>> > > >> >> >
>> >>> > > >> >> > 1. No, this change won't have an effect. To find the
>> >>> upper-bound
>> >>> > > >> offset
>> >>> > > >> >> > [1], we have to
>> >>> > > >> >> >     fetch that segment's offset index file. The
>> >>> RemoteIndexCache
>> >>> > > [2]
>> >>> > > >> >> > fetches all the 3
>> >>> > > >> >> >     index files together and caches them for subsequent
>> use,
>> >>> so
>> >>> > > this
>> >>> > > >> >> > improvement
>> >>> > > >> >> >     won't have an effect as the current segment txn index
>> >>> gets
>> >>> > > >> >> downloaded
>> >>> > > >> >> > anyway.
>> >>> > > >> >> >
>> >>> > > >> >> > 2. The reason for choosing boolean is to make the change
>> >>> backward
>> >>> > > >> >> > compatible.
>> >>> > > >> >> >      There can be existing RLM events for the uploaded
>> >>> segments.
>> >>> > > The
>> >>> > > >> >> > default
>> >>> > > >> >> >      value of `txnIdxEmpty` is false so the *old* RLM
>> events
>> >>> are
>> >>> > > >> >> assumed to
>> >>> > > >> >> > contain
>> >>> > > >> >> >      the txn index files and those files are downloaded if
>> >>> they
>> >>> > > >> exist.
>> >>> > > >> >> >
>> >>> > > >> >> > [1]:
>> >>> > > >> >> >
>> >>> > > >> >> >
>> >>> > > >> >>
>> >>> > > >>
>> >>> > >
>> >>> >
>> >>>
>> https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1732
>> >>> > > >> >> > [2]:
>> >>> > > >> >> >
>> >>> > > >> >> >
>> >>> > > >> >>
>> >>> > > >>
>> >>> > >
>> >>> >
>> >>>
>> https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/storage/src/main/java/org/apache/kafka/storage/internals/log/RemoteIndexCache.java?L383
>> >>> > > >> >> >
>> >>> > > >> >> > Thanks,
>> >>> > > >> >> > Kamal
>> >>> > > >> >> >
>> >>> > > >> >> > On Thu, Oct 3, 2024 at 3:11 PM Luke Chen <
>> show...@gmail.com>
>> >>> > > wrote:
>> >>> > > >> >> >
>> >>> > > >> >> > > Hi Kamal,
>> >>> > > >> >> > >
>> >>> > > >> >> > > Sorry for the late review.
>> >>> > > >> >> > > Thanks for the KIP, this will improve the transaction
>> >>> reading
>> >>> > for
>> >>> > > >> >> remote
>> >>> > > >> >> > > storage.
>> >>> > > >> >> > > Overall LGTM, just one minor thought:
>> >>> > > >> >> > >
>> >>> > > >> >> > > Currently, we only store the `TxnIndexEmpty` bool value
>> in
>> >>> the
>> >>> > > >> segment
>> >>> > > >> >> > > metadata.
>> >>> > > >> >> > > Do you think it is helpful if we store the "least abort
>> >>> start
>> >>> > > >> offset
>> >>> > > >> >> in
>> >>> > > >> >> > the
>> >>> > > >> >> > > segment", and -1 means no txnIndex. So that we can have
>> a
>> >>> way
>> >>> > to
>> >>> > > >> know
>> >>> > > >> >> if
>> >>> > > >> >> > we
>> >>> > > >> >> > > need to fetch this txn index or not.
>> >>> > > >> >> > >
>> >>> > > >> >> > > Thanks.
>> >>> > > >> >> > > Luke
>> >>> > > >> >> > >
>> >>> > > >> >> > > On Mon, Sep 9, 2024 at 3:26 PM Kamal Chandraprakash <
>> >>> > > >> >> > > kamal.chandraprak...@gmail.com> wrote:
>> >>> > > >> >> > >
>> >>> > > >> >> > > > Hi all,
>> >>> > > >> >> > > >
>> >>> > > >> >> > > > If there are no more comments, I'll start a voting
>> thread
>> >>> > soon.
>> >>> > > >> >> > > >
>> >>> > > >> >> > > > Thanks,
>> >>> > > >> >> > > > Kamal
>> >>> > > >> >> > > >
>> >>> > > >> >> > > > On Fri, Sep 6, 2024 at 7:28 PM Kamal Chandraprakash <
>> >>> > > >> >> > > > kamal.chandraprak...@gmail.com> wrote:
>> >>> > > >> >> > > >
>> >>> > > >> >> > > > > Bumping this thread again for review!
>> >>> > > >> >> > > > >
>> >>> > > >> >> > > > > Reduced the scope of the proposal to minimum. We
>> will
>> >>> be
>> >>> > > adding
>> >>> > > >> >> only
>> >>> > > >> >> > > one
>> >>> > > >> >> > > > > field (txnIdxEmpty) to the
>> >>> > > >> >> > > > > RemoteLogSegmentMetadata event which is backward
>> >>> > compatible.
>> >>> > > >> PTAL.
>> >>> > > >> >> > > > >
>> >>> > > >> >> > > > > Thanks,
>> >>> > > >> >> > > > > Kamal
>> >>> > > >> >> > > > >
>> >>> > > >> >> > > > >
>> >>> > > >> >> > > > > On Tue, Aug 13, 2024 at 6:33 PM Kamal
>> Chandraprakash <
>> >>> > > >> >> > > > > kamal.chandraprak...@gmail.com> wrote:
>> >>> > > >> >> > > > >
>> >>> > > >> >> > > > >> Bumping this thread for KIP review!
>> >>> > > >> >> > > > >>
>> >>> > > >> >> > > > >> We can go for the simplest solution that is
>> proposed
>> >>> in
>> >>> > this
>> >>> > > >> KIP
>> >>> > > >> >> and
>> >>> > > >> >> > > > >> it can be improved in the subsequent iteration.
>> PTAL.
>> >>> > > >> >> > > > >>
>> >>> > > >> >> > > > >> Thanks,
>> >>> > > >> >> > > > >> Kamal
>> >>> > > >> >> > > > >>
>> >>> > > >> >> > > > >> On Fri, Aug 2, 2024 at 11:42 AM Kamal
>> Chandraprakash <
>> >>> > > >> >> > > > >> kamal.chandraprak...@gmail.com> wrote:
>> >>> > > >> >> > > > >>
>> >>> > > >> >> > > > >>> Hi Divij,
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>> Thanks for the review! And, sorry for the late
>> reply.
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>> From the UnifiedLog.scala
>> >>> > > >> >> > > > >>> <
>> >>> > > >> >> > > >
>> >>> > > >> >> > >
>> >>> > > >> >> >
>> >>> > > >> >>
>> >>> > > >>
>> >>> > >
>> >>> >
>> >>>
>> https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/core/src/main/scala/kafka/log/UnifiedLog.scala?L421-427
>> >>> > > >> >> > > > >
>> >>> > > >> >> > > > >>> doc:
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>> """
>> >>> > > >> >> > > > >>> The last stable offset (LSO) is defined as the
>> first
>> >>> > offset
>> >>> > > >> such
>> >>> > > >> >> > that
>> >>> > > >> >> > > > >>> all lower offsets have been "decided."
>> >>> > > >> >> > > > >>>    * Non-transactional messages are considered
>> >>> decided
>> >>> > > >> >> immediately,
>> >>> > > >> >> > > but
>> >>> > > >> >> > > > >>> transactional messages are only decided when
>> >>> > > >> >> > > > >>>    * the corresponding COMMIT or ABORT marker is
>> >>> written.
>> >>> > > >> This
>> >>> > > >> >> > > implies
>> >>> > > >> >> > > > >>> that the last stable offset will be equal
>> >>> > > >> >> > > > >>>    * to the high watermark if there are no
>> >>> transactional
>> >>> > > >> >> messages
>> >>> > > >> >> > in
>> >>> > > >> >> > > > the
>> >>> > > >> >> > > > >>> log. Note also that the LSO cannot advance
>> >>> > > >> >> > > > >>>    * beyond the high watermark.
>> >>> > > >> >> > > > >>> """
>> >>> > > >> >> > > > >>> While rolling the active segment to passive, if
>> LSO
>> >>> > equals
>> >>> > > to
>> >>> > > >> >> HW,
>> >>> > > >> >> > > then
>> >>> > > >> >> > > > >>> all the messages in that segment are
>> >>> > > >> >> > > > >>> decided and we can store the `lastStableOffsetLag`
>> >>> as an
>> >>> > > >> >> attribute
>> >>> > > >> >> > in
>> >>> > > >> >> > > > >>> the rolled segment. We can then propagate
>> >>> > > >> >> > > > >>> the `lastStableOffsetLag` information in the
>> >>> > > >> RemoteLogMetadata
>> >>> > > >> >> > > events.
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>> While reading the remote log segment, if the
>> >>> > > >> >> `lastStableOffsetLag`
>> >>> > > >> >> > is
>> >>> > > >> >> > > > 0,
>> >>> > > >> >> > > > >>> then there is no need to traverse to
>> >>> > > >> >> > > > >>> the subsequent segments for aborted transactions
>> >>> which
>> >>> > > covers
>> >>> > > >> >> the
>> >>> > > >> >> > > case
>> >>> > > >> >> > > > >>> for the dominant case where the
>> >>> > > >> >> > > > >>> partition had no transactions at all.
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>> With Log compaction, the shrinked segments might
>> get
>> >>> > > merged.
>> >>> > > >> One
>> >>> > > >> >> > > option
>> >>> > > >> >> > > > >>> is to take the max of `lastStableOffsetLag`
>> >>> > > >> >> > > > >>> and store it in the new LogSegment. Since, the
>> tiered
>> >>> > > storage
>> >>> > > >> >> does
>> >>> > > >> >> > > not
>> >>> > > >> >> > > > >>> support compacted topics / historical compacted
>> >>> > > >> >> > > > >>> topics, we can omit this case.
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>> If this approach looks good, I can update the KIP
>> >>> with
>> >>> > the
>> >>> > > >> >> details.
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>> --
>> >>> > > >> >> > > > >>> Kamal
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>> On Tue, Jun 25, 2024 at 4:24 PM Divij Vaidya <
>> >>> > > >> >> > > divijvaidy...@gmail.com>
>> >>> > > >> >> > > > >>> wrote:
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > > >>>> Hi Kamal
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > > >>>> Thanks for the bump. I have been thinking about
>> this
>> >>> > > >> passively
>> >>> > > >> >> for
>> >>> > > >> >> > > the
>> >>> > > >> >> > > > >>>> past
>> >>> > > >> >> > > > >>>> few days.
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > > >>>> The simplest solution is to store a state at
>> segment
>> >>> > level
>> >>> > > >> >> > metadata.
>> >>> > > >> >> > > > The
>> >>> > > >> >> > > > >>>> state should specify whether the trx index is
>> empty
>> >>> or
>> >>> > > not.
>> >>> > > >> It
>> >>> > > >> >> > would
>> >>> > > >> >> > > > be
>> >>> > > >> >> > > > >>>> populated during segment archival. We would then
>> >>> iterate
>> >>> > > >> over
>> >>> > > >> >> the
>> >>> > > >> >> > > > >>>> metadata
>> >>> > > >> >> > > > >>>> for future segments without having to make a
>> remote
>> >>> call
>> >>> > > to
>> >>> > > >> >> > download
>> >>> > > >> >> > > > the
>> >>> > > >> >> > > > >>>> trx index itself.
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > > >>>> The other solution for storing state at a
>> partition
>> >>> > level
>> >>> > > >> >> wouldn't
>> >>> > > >> >> > > > >>>> work, as
>> >>> > > >> >> > > > >>>> you mentioned, because we will have to change the
>> >>> state
>> >>> > on
>> >>> > > >> >> every
>> >>> > > >> >> > > > >>>> mutation
>> >>> > > >> >> > > > >>>> to the log i.e. at expiration of segments and
>> >>> append.
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > > >>>> I have been thinking whether we can do something
>> >>> better
>> >>> > > than
>> >>> > > >> >> the
>> >>> > > >> >> > > > simple
>> >>> > > >> >> > > > >>>> solution, hence the delay in replying. Let me
>> tell
>> >>> you
>> >>> > my
>> >>> > > >> half
>> >>> > > >> >> > baked
>> >>> > > >> >> > > > >>>> train
>> >>> > > >> >> > > > >>>> of thoughts, perhaps, you can explore this as
>> well.
>> >>> I
>> >>> > have
>> >>> > > >> been
>> >>> > > >> >> > > > thinking
>> >>> > > >> >> > > > >>>> about using LSO (last stable offset) to handle
>> the
>> >>> case
>> >>> > > when
>> >>> > > >> >> the
>> >>> > > >> >> > > > >>>> partition
>> >>> > > >> >> > > > >>>> never had any transactions. For a partition which
>> >>> never
>> >>> > > had
>> >>> > > >> any
>> >>> > > >> >> > > > >>>> transaction, I would assume that the LSO is never
>> >>> > > >> initialized
>> >>> > > >> >> (or
>> >>> > > >> >> > is
>> >>> > > >> >> > > > >>>> equal
>> >>> > > >> >> > > > >>>> to log start offset)? Or is it equal to HW in
>> that
>> >>> case?
>> >>> > > >> This
>> >>> > > >> >> is
>> >>> > > >> >> > > > >>>> something
>> >>> > > >> >> > > > >>>> that I am yet to verify. If this idea works,
>> then we
>> >>> > would
>> >>> > > >> not
>> >>> > > >> >> > have
>> >>> > > >> >> > > to
>> >>> > > >> >> > > > >>>> iterate through the metadata for the dominant
>> case
>> >>> where
>> >>> > > the
>> >>> > > >> >> > > partition
>> >>> > > >> >> > > > >>>> had
>> >>> > > >> >> > > > >>>> no transactions at all.
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > > >>>> --
>> >>> > > >> >> > > > >>>> Divij Vaidya
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > > >>>> On Tue, Jun 25, 2024 at 11:42 AM Kamal
>> >>> Chandraprakash <
>> >>> > > >> >> > > > >>>> kamal.chandraprak...@gmail.com> wrote:
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > > >>>> > Bump. Please review this proposal.
>> >>> > > >> >> > > > >>>> >
>> >>> > > >> >> > > > >>>> >
>> >>> > > >> >> > > > >>>> > On Mon, Jun 17, 2024 at 6:55 PM Kamal
>> >>> Chandraprakash <
>> >>> > > >> >> > > > >>>> > kamal.chandraprak...@gmail.com> wrote:
>> >>> > > >> >> > > > >>>> >
>> >>> > > >> >> > > > >>>> > > Divij,
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > Thanks for the review! Updated the KIP with
>> 1,
>> >>> 2, 3,
>> >>> > > >> and 4
>> >>> > > >> >> > > review
>> >>> > > >> >> > > > >>>> > > comments.
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > > 4. Potential alternative - Instead of
>> having
>> >>> an
>> >>> > > >> algorithm
>> >>> > > >> >> > > where
>> >>> > > >> >> > > > we
>> >>> > > >> >> > > > >>>> > > traverse
>> >>> > > >> >> > > > >>>> > > across segment metadata and looking for
>> >>> > isTxnIdxEmpty
>> >>> > > >> flag,
>> >>> > > >> >> > > should
>> >>> > > >> >> > > > >>>> we
>> >>> > > >> >> > > > >>>> > > directly introduce a nextSegmentWithTrxInx()
>> >>> > function?
>> >>> > > >> This
>> >>> > > >> >> > > would
>> >>> > > >> >> > > > >>>> allow
>> >>> > > >> >> > > > >>>> > > implementers to optimize the otherwise linear
>> >>> scan
>> >>> > > >> across
>> >>> > > >> >> > > metadata
>> >>> > > >> >> > > > >>>> for
>> >>> > > >> >> > > > >>>> > all
>> >>> > > >> >> > > > >>>> > > segments by using techniques such as skip
>> list
>> >>> etc.
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > This is a good point to optimize the scan. We
>> >>> need
>> >>> > to
>> >>> > > >> >> maintain
>> >>> > > >> >> > > the
>> >>> > > >> >> > > > >>>> > > skip-list
>> >>> > > >> >> > > > >>>> > > for each leader-epoch. With unclean leader
>> >>> election,
>> >>> > > >> some
>> >>> > > >> >> > > brokers
>> >>> > > >> >> > > > >>>> may not
>> >>> > > >> >> > > > >>>> > > have
>> >>> > > >> >> > > > >>>> > > the complete lineage. This will expand the
>> >>> scope of
>> >>> > > the
>> >>> > > >> >> work.
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > In this version, we plan to optimize only for
>> >>> the
>> >>> > > below
>> >>> > > >> 2
>> >>> > > >> >> > cases:
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > 1. A partition does not have the transaction
>> >>> index
>> >>> > for
>> >>> > > >> any
>> >>> > > >> >> of
>> >>> > > >> >> > > the
>> >>> > > >> >> > > > >>>> > uploaded
>> >>> > > >> >> > > > >>>> > > segments.
>> >>> > > >> >> > > > >>>> > >    The individual log segments
>> `isTxnIdxEmpty`
>> >>> flag
>> >>> > > can
>> >>> > > >> be
>> >>> > > >> >> > > reduced
>> >>> > > >> >> > > > >>>> to a
>> >>> > > >> >> > > > >>>> > > single flag
>> >>> > > >> >> > > > >>>> > >    in RLMM (using AND operator) that can
>> serve
>> >>> the
>> >>> > > >> query -
>> >>> > > >> >> "Is
>> >>> > > >> >> > > all
>> >>> > > >> >> > > > >>>> the
>> >>> > > >> >> > > > >>>> > > transaction indexes empty for a partition?".
>> >>> > > >> >> > > > >>>> > >    If yes, then we can directly scan the
>> >>> local-log
>> >>> > for
>> >>> > > >> >> aborted
>> >>> > > >> >> > > > >>>> > > transactions.
>> >>> > > >> >> > > > >>>> > > 2. A partition is produced using the
>> >>> transactional
>> >>> > > >> >> producer.
>> >>> > > >> >> > The
>> >>> > > >> >> > > > >>>> > > assumption made is that
>> >>> > > >> >> > > > >>>> > >     the transaction will either
>> commit/rollback
>> >>> > within
>> >>> > > >> 15
>> >>> > > >> >> > > minutes
>> >>> > > >> >> > > > >>>> > >     (default transaction.max.timeout.ms = 15
>> >>> mins),
>> >>> > > >> >> possibly
>> >>> > > >> >> > we
>> >>> > > >> >> > > > >>>> may have
>> >>> > > >> >> > > > >>>> > > to search only
>> >>> > > >> >> > > > >>>> > >     a few consecutive remote log segments to
>> >>> collect
>> >>> > > the
>> >>> > > >> >> > aborted
>> >>> > > >> >> > > > >>>> > > transactions.
>> >>> > > >> >> > > > >>>> > > 3. A partition is being produced with both
>> >>> normal
>> >>> > and
>> >>> > > >> >> > > > transactional
>> >>> > > >> >> > > > >>>> > > producers. In this case,
>> >>> > > >> >> > > > >>>> > >     we will be doing linear traversal.
>> >>> Maintaining a
>> >>> > > >> >> skip-list
>> >>> > > >> >> > > > might
>> >>> > > >> >> > > > >>>> > > improve the performance but
>> >>> > > >> >> > > > >>>> > >     we delegate the RLMM implementation to
>> >>> users. If
>> >>> > > >> >> > implemented
>> >>> > > >> >> > > > >>>> > > incorrectly, then it can lead
>> >>> > > >> >> > > > >>>> > >     to delivery of the aborted transaction
>> >>> records
>> >>> > to
>> >>> > > >> the
>> >>> > > >> >> > > > consumer.
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > I notice two drawbacks with the reduction
>> >>> method as
>> >>> > > >> >> proposed
>> >>> > > >> >> > in
>> >>> > > >> >> > > > the
>> >>> > > >> >> > > > >>>> KIP:
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > 1. Even if one segment has a transaction
>> index,
>> >>> then
>> >>> > > we
>> >>> > > >> >> have
>> >>> > > >> >> > to
>> >>> > > >> >> > > > >>>> iterate
>> >>> > > >> >> > > > >>>> > > over all the metadata events.
>> >>> > > >> >> > > > >>>> > > 2. Assume that there are 10 segments and
>> >>> segment-5
>> >>> > > has a
>> >>> > > >> >> txn
>> >>> > > >> >> > > > index.
>> >>> > > >> >> > > > >>>> Once
>> >>> > > >> >> > > > >>>> > > the first 6 segments are deleted,
>> >>> > > >> >> > > > >>>> > >     due to breach by time/size/start-offset,
>> >>> then we
>> >>> > > >> should
>> >>> > > >> >> > > return
>> >>> > > >> >> > > > >>>> `true`
>> >>> > > >> >> > > > >>>> > > for "Is all the transaction indexes empty
>> for a
>> >>> > > >> partition?"
>> >>> > > >> >> > > > >>>> > >    query but it will return `false` until the
>> >>> broker
>> >>> > > >> gets
>> >>> > > >> >> > > > restarted
>> >>> > > >> >> > > > >>>> and
>> >>> > > >> >> > > > >>>> > we
>> >>> > > >> >> > > > >>>> > > have to resort to iterate over all the
>> metadata
>> >>> > > events.
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > > 5. Potential alternative#2 - We know that
>> we
>> >>> may
>> >>> > > want
>> >>> > > >> the
>> >>> > > >> >> > > > indexes
>> >>> > > >> >> > > > >>>> of
>> >>> > > >> >> > > > >>>> > > multiple higher segments. Instead of fetching
>> >>> them
>> >>> > > >> >> > sequentially,
>> >>> > > >> >> > > > we
>> >>> > > >> >> > > > >>>> could
>> >>> > > >> >> > > > >>>> > > implement a parallel fetch or a pre-fetch for
>> >>> the
>> >>> > > >> indexes.
>> >>> > > >> >> > This
>> >>> > > >> >> > > > >>>> would
>> >>> > > >> >> > > > >>>> > help
>> >>> > > >> >> > > > >>>> > > hide the latency of sequentially fetching the
>> >>> trx
>> >>> > > >> indexes.
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > We can implement parallel-fetch/prefetch once
>> >>> the
>> >>> > > tiered
>> >>> > > >> >> > storage
>> >>> > > >> >> > > > is
>> >>> > > >> >> > > > >>>> GAed.
>> >>> > > >> >> > > > >>>> > > Since this feature will be useful
>> >>> > > >> >> > > > >>>> > > to prefetch the next remote log segment and
>> it
>> >>> > expands
>> >>> > > >> the
>> >>> > > >> >> > scope
>> >>> > > >> >> > > > of
>> >>> > > >> >> > > > >>>> the
>> >>> > > >> >> > > > >>>> > > work.
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > > 6. Should the proposed API take "segmentId"
>> >>> as a
>> >>> > > >> >> parameter
>> >>> > > >> >> > > > >>>> instead of
>> >>> > > >> >> > > > >>>> > > "topicIdPartition"? Suggesting because
>> >>> isTxnIdEmpty
>> >>> > is
>> >>> > > >> not
>> >>> > > >> >> a
>> >>> > > >> >> > > > >>>> property of
>> >>> > > >> >> > > > >>>> > a
>> >>> > > >> >> > > > >>>> > > partition, instead it's a property of a
>> specific
>> >>> > > >> segment.
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > We propose to use the `topicIdPartition` in
>> >>> > > >> >> > > > >>>> RemoteLogMetadataManager.
>> >>> > > >> >> > > > >>>> > > The implementation can fold/reduce the value
>> of
>> >>> the
>> >>> > > >> >> individual
>> >>> > > >> >> > > log
>> >>> > > >> >> > > > >>>> > segment
>> >>> > > >> >> > > > >>>> > > `isTxnIdEmpty` flag. This is added to avoid
>> >>> scanning
>> >>> > > all
>> >>> > > >> >> the
>> >>> > > >> >> > > > >>>> metadata
>> >>> > > >> >> > > > >>>> > > events
>> >>> > > >> >> > > > >>>> > > when the partition does not have a
>> transaction
>> >>> index
>> >>> > > in
>> >>> > > >> >> any of
>> >>> > > >> >> > > the
>> >>> > > >> >> > > > >>>> > > segments.
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > > On Mon, Jun 17, 2024 at 4:05 PM Divij Vaidya
>> <
>> >>> > > >> >> > > > >>>> divijvaidy...@gmail.com>
>> >>> > > >> >> > > > >>>> > > wrote:
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> > >> Hi Kamal
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >> Thanks for bringing this up. This is a
>> problem
>> >>> > worth
>> >>> > > >> >> solving.
>> >>> > > >> >> > > We
>> >>> > > >> >> > > > >>>> have
>> >>> > > >> >> > > > >>>> > >> faced
>> >>> > > >> >> > > > >>>> > >> this in situations where some Kafka clients
>> >>> default
>> >>> > > to
>> >>> > > >> >> > > > >>>> read_committed
>> >>> > > >> >> > > > >>>> > mode
>> >>> > > >> >> > > > >>>> > >> and end up having high latencies for remote
>> >>> fetches
>> >>> > > >> due to
>> >>> > > >> >> > this
>> >>> > > >> >> > > > >>>> > traversal
>> >>> > > >> >> > > > >>>> > >> across all segments.
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >> First some nits to clarify the KIP:
>> >>> > > >> >> > > > >>>> > >> 1. The motivation should make it clear that
>> >>> > traversal
>> >>> > > >> of
>> >>> > > >> >> all
>> >>> > > >> >> > > > >>>> segments is
>> >>> > > >> >> > > > >>>> > >> only in the worst case. If I am not mistaken
>> >>> > (please
>> >>> > > >> >> correct
>> >>> > > >> >> > me
>> >>> > > >> >> > > > if
>> >>> > > >> >> > > > >>>> > wrong),
>> >>> > > >> >> > > > >>>> > >> the traversal stops when it has found a
>> segment
>> >>> > > >> containing
>> >>> > > >> >> > LSO.
>> >>> > > >> >> > > > >>>> > >> 2. There is nothing like a non-txn topic. A
>> >>> > > transaction
>> >>> > > >> >> may
>> >>> > > >> >> > be
>> >>> > > >> >> > > > >>>> started
>> >>> > > >> >> > > > >>>> > on
>> >>> > > >> >> > > > >>>> > >> any topic. Perhaps, rephrase the statement
>> in
>> >>> the
>> >>> > KIP
>> >>> > > >> so
>> >>> > > >> >> that
>> >>> > > >> >> > > it
>> >>> > > >> >> > > > is
>> >>> > > >> >> > > > >>>> > clear
>> >>> > > >> >> > > > >>>> > >> to the reader.
>> >>> > > >> >> > > > >>>> > >> 3. The hyperlink in the "the broker has to
>> >>> traverse
>> >>> > > all
>> >>> > > >> >> > the..."
>> >>> > > >> >> > > > >>>> seems
>> >>> > > >> >> > > > >>>> > >> incorrect. Did you want to point to
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> >
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > >
>> >>> > > >> >> > >
>> >>> > > >> >> >
>> >>> > > >> >>
>> >>> > > >>
>> >>> > >
>> >>> >
>> >>>
>> https://github.com/apache/kafka/blob/21d60eabab8a14c8002611c65e092338bf584314/core/src/main/scala/kafka/log/LocalLog.scala#L444
>> >>> > > >> >> > > > >>>> > >> ?
>> >>> > > >> >> > > > >>>> > >> 4. In the testing section, could we add a
>> test
>> >>> > plan?
>> >>> > > >> For
>> >>> > > >> >> > > > example, I
>> >>> > > >> >> > > > >>>> > would
>> >>> > > >> >> > > > >>>> > >> list down adding a test which would verify
>> the
>> >>> > number
>> >>> > > >> of
>> >>> > > >> >> > calls
>> >>> > > >> >> > > > >>>> made to
>> >>> > > >> >> > > > >>>> > >> RLMM. This test would have a higher number
>> of
>> >>> calls
>> >>> > > >> >> earlier
>> >>> > > >> >> > vs.
>> >>> > > >> >> > > > >>>> after
>> >>> > > >> >> > > > >>>> > this
>> >>> > > >> >> > > > >>>> > >> KIP.
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >> Other thoughts:
>> >>> > > >> >> > > > >>>> > >> 4. Potential alternative - Instead of
>> having an
>> >>> > > >> algorithm
>> >>> > > >> >> > where
>> >>> > > >> >> > > > we
>> >>> > > >> >> > > > >>>> > >> traverse
>> >>> > > >> >> > > > >>>> > >> across segment metadata and looking for
>> >>> > isTxnIdxEmpty
>> >>> > > >> >> flag,
>> >>> > > >> >> > > > should
>> >>> > > >> >> > > > >>>> we
>> >>> > > >> >> > > > >>>> > >> directly introduce a nextSegmentWithTrxInx()
>> >>> > > function?
>> >>> > > >> >> This
>> >>> > > >> >> > > would
>> >>> > > >> >> > > > >>>> allow
>> >>> > > >> >> > > > >>>> > >> implementers to optimize the otherwise
>> linear
>> >>> scan
>> >>> > > >> across
>> >>> > > >> >> > > > metadata
>> >>> > > >> >> > > > >>>> for
>> >>> > > >> >> > > > >>>> > all
>> >>> > > >> >> > > > >>>> > >> segments by using techniques such as skip
>> list
>> >>> etc.
>> >>> > > >> >> > > > >>>> > >> 5. Potential alternative#2 - We know that we
>> >>> may
>> >>> > want
>> >>> > > >> the
>> >>> > > >> >> > > indexes
>> >>> > > >> >> > > > >>>> of
>> >>> > > >> >> > > > >>>> > >> multiple higher segments. Instead of
>> fetching
>> >>> them
>> >>> > > >> >> > > sequentially,
>> >>> > > >> >> > > > we
>> >>> > > >> >> > > > >>>> > could
>> >>> > > >> >> > > > >>>> > >> implement a parallel fetch or a pre-fetch
>> for
>> >>> the
>> >>> > > >> indexes.
>> >>> > > >> >> > This
>> >>> > > >> >> > > > >>>> would
>> >>> > > >> >> > > > >>>> > help
>> >>> > > >> >> > > > >>>> > >> hide the latency of sequentially fetching
>> the
>> >>> trx
>> >>> > > >> indexes.
>> >>> > > >> >> > > > >>>> > >> 6. Should the proposed API take "segmentId"
>> as
>> >>> a
>> >>> > > >> parameter
>> >>> > > >> >> > > > instead
>> >>> > > >> >> > > > >>>> of
>> >>> > > >> >> > > > >>>> > >> "topicIdPartition"? Suggesting because
>> >>> isTxnIdEmpty
>> >>> > > is
>> >>> > > >> >> not a
>> >>> > > >> >> > > > >>>> property
>> >>> > > >> >> > > > >>>> > of a
>> >>> > > >> >> > > > >>>> > >> partition, instead it's a property of a
>> >>> specific
>> >>> > > >> segment.
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >> Looking forward to hearing your thoughts
>> about
>> >>> the
>> >>> > > >> >> > > alternatives.
>> >>> > > >> >> > > > >>>> Let's
>> >>> > > >> >> > > > >>>> > get
>> >>> > > >> >> > > > >>>> > >> this fixed.
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >> --
>> >>> > > >> >> > > > >>>> > >> Divij Vaidya
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >> On Mon, Jun 17, 2024 at 11:40 AM Kamal
>> >>> > > Chandraprakash <
>> >>> > > >> >> > > > >>>> > >> kamal.chandraprak...@gmail.com> wrote:
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >> > Hi all,
>> >>> > > >> >> > > > >>>> > >> >
>> >>> > > >> >> > > > >>>> > >> > I have opened a KIP-1058
>> >>> > > >> >> > > > >>>> > >> > <
>> >>> > > >> >> > > > >>>> > >> >
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> >
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > >
>> >>> > > >> >> > >
>> >>> > > >> >> >
>> >>> > > >> >>
>> >>> > > >>
>> >>> > >
>> >>> >
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1058%3A+Txn+consumer+exerts+pressure+on+remote+storage+when+reading+non-txn+topic
>> >>> > > >> >> > > > >>>> > >> > >
>> >>> > > >> >> > > > >>>> > >> > to reduce the pressure on remote storage
>> when
>> >>> > > >> >> transactional
>> >>> > > >> >> > > > >>>> consumers
>> >>> > > >> >> > > > >>>> > >> are
>> >>> > > >> >> > > > >>>> > >> > reading non-txn topics from remote
>> storage.
>> >>> > > >> >> > > > >>>> > >> >
>> >>> > > >> >> > > > >>>> > >> >
>> >>> > > >> >> > > > >>>> > >> >
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> >
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > >
>> >>> > > >> >> > >
>> >>> > > >> >> >
>> >>> > > >> >>
>> >>> > > >>
>> >>> > >
>> >>> >
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1058%3A+Txn+consumer+exerts+pressure+on+remote+storage+when+reading+non-txn+topic
>> >>> > > >> >> > > > >>>> > >> >
>> >>> > > >> >> > > > >>>> > >> > Feedbacks and suggestions are welcome.
>> >>> > > >> >> > > > >>>> > >> >
>> >>> > > >> >> > > > >>>> > >> > Thanks,
>> >>> > > >> >> > > > >>>> > >> > Kamal
>> >>> > > >> >> > > > >>>> > >> >
>> >>> > > >> >> > > > >>>> > >>
>> >>> > > >> >> > > > >>>> > >
>> >>> > > >> >> > > > >>>> >
>> >>> > > >> >> > > > >>>>
>> >>> > > >> >> > > > >>>
>> >>> > > >> >> > > >
>> >>> > > >> >> > >
>> >>> > > >> >> >
>> >>> > > >> >>
>> >>> > > >> >
>> >>> > > >>
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> >>
>>
>

Re: [DISCUSS] KIP-1058: Txn consumer exerts pressure on remote storage when reading non-txn topic

Reply via email to