The sequence summary looks right to me.
For log normalization, are you referring to compaction? The segment's first
and last offsets might change, but a batch keeps its offsets when
compaction occurs.

Hope that helps.
Justine

On Mon, Aug 7, 2023 at 8:59 AM Matthias J. Sax <mj...@apache.org> wrote:

> > but the base offset may change during log normalizing.
>
> Not sure what you mean by "normalization" but offsets are immutable, so
> they don't change. (To be fair, I am not an expert on brokers, so not
> sure how this work in detail when log compaction ticks in).
>
> > This field is given by the producer and the broker should only read it.
>
> Sounds right. The point being is, that the broker has an "expected"
> value for it, and if the provided value does not match the expected one,
> the write is rejected to begin with.
>
>
> -Matthias
>
> On 8/7/23 6:35 AM, tison wrote:
> > Hi Matthias and Justine,
> >
> > Thanks for your reply!
> >
> > I can summarize the answer as -
> >
> > Record offset = base offset + offset delta. This field is calculated by
> the
> > broker and the delta won't change but the base offset may change during
> log
> > normalizing.
> > Record sequence = base sequence + (offset) delta. This field is given by
> > the producer and the broker should only read it.
> >
> > Is it correct?
> >
> > I implement the manipulation part of base offset following this
> > understanding at [1].
> >
> > Best,
> > tison.
> >
> > [1]
> >
> https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/simplesrv/src/lib.rs#L391-L394
> >
> >
> > Justine Olshan <jols...@confluent.io.invalid> 于2023年8月2日周三 04:19写道:
> >
> >> For what it's worth -- the sequence number is not calculated
> >> "baseOffset/baseSequence + offset delta" but rather by monotonically
> >> increasing for a given epoch. If the epoch is bumped, we reset back to
> >> zero.
> >> This may mean that the offset and sequence may match, but do not
> strictly
> >> need to be the same. The sequence number will also always come from the
> >> client and be in the produce records sent to the Kafka broker.
> >>
> >> As for offsets, there is some code in the log layer that maintains the
> log
> >> end offset and assigns offsets to the records. The produce handling on
> the
> >> leader should typically assign the offset.
> >> I believe you can find that code here:
> >>
> >>
> https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766
> >>
> >> Justine
> >>
> >> On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax <mj...@apache.org>
> wrote:
> >>
> >>> The _offset_ is the position of the record in the partition.
> >>>
> >>> The _sequence number_ is a unique ID that allows broker to de-duplicate
> >>> messages. It requires the producer to implement the idempotency
> protocol
> >>> (part of Kafka transactions); thus, sequence numbers are optional and
> as
> >>> long as you don't want to support idempotent writes, you don't need to
> >>> worry about them. (If you want to dig into details, checkout KIP-98
> that
> >>> is the original KIP about Kafka TX).
> >>>
> >>> HTH,
> >>>     -Matthias
> >>>
> >>> On 8/1/23 2:19 AM, tison wrote:
> >>>> Hi,
> >>>>
> >>>> I'm wringing a Kafka API Rust codec library[1] to understand how Kafka
> >>>> models its concepts and how the core business logic works.
> >>>>
> >>>> During implementing the codec for Records[2], I saw a twins of fields
> >>>> "sequence" and "offset". Both of them are calculated by
> >>>> baseOffset/baseSequence + offset delta. Then I'm a bit confused how to
> >>> deal
> >>>> with them properly - what's the difference between these two concepts
> >>>> logically?
> >>>>
> >>>> Also, to understand how the core business logic works, I write a
> simple
> >>>> server based on my codec library, and observe that the server may need
> >> to
> >>>> update offset for records produced. How does Kafka set the correct
> >> offset
> >>>> for each produced records? And how does Kafka maintain the calculation
> >>> for
> >>>> offset and sequence during these modifications?
> >>>>
> >>>> I'll appreciate if anyone can answer the question or give some
> insights
> >>> :D
> >>>>
> >>>> Best,
> >>>> tison.
> >>>>
> >>>> [1] https://github.com/tisonkun/kafka-api
> >>>> [2] https://kafka.apache.org/documentation/#messageformat
> >>>>
> >>>
> >>
> >
>

Reply via email to