The sequence summary looks right to me. For log normalization, are you referring to compaction? The segment's first and last offsets might change, but a batch keeps its offsets when compaction occurs.
Hope that helps. Justine On Mon, Aug 7, 2023 at 8:59 AM Matthias J. Sax <mj...@apache.org> wrote: > > but the base offset may change during log normalizing. > > Not sure what you mean by "normalization" but offsets are immutable, so > they don't change. (To be fair, I am not an expert on brokers, so not > sure how this work in detail when log compaction ticks in). > > > This field is given by the producer and the broker should only read it. > > Sounds right. The point being is, that the broker has an "expected" > value for it, and if the provided value does not match the expected one, > the write is rejected to begin with. > > > -Matthias > > On 8/7/23 6:35 AM, tison wrote: > > Hi Matthias and Justine, > > > > Thanks for your reply! > > > > I can summarize the answer as - > > > > Record offset = base offset + offset delta. This field is calculated by > the > > broker and the delta won't change but the base offset may change during > log > > normalizing. > > Record sequence = base sequence + (offset) delta. This field is given by > > the producer and the broker should only read it. > > > > Is it correct? > > > > I implement the manipulation part of base offset following this > > understanding at [1]. > > > > Best, > > tison. > > > > [1] > > > https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/simplesrv/src/lib.rs#L391-L394 > > > > > > Justine Olshan <jols...@confluent.io.invalid> 于2023年8月2日周三 04:19写道: > > > >> For what it's worth -- the sequence number is not calculated > >> "baseOffset/baseSequence + offset delta" but rather by monotonically > >> increasing for a given epoch. If the epoch is bumped, we reset back to > >> zero. > >> This may mean that the offset and sequence may match, but do not > strictly > >> need to be the same. The sequence number will also always come from the > >> client and be in the produce records sent to the Kafka broker. > >> > >> As for offsets, there is some code in the log layer that maintains the > log > >> end offset and assigns offsets to the records. The produce handling on > the > >> leader should typically assign the offset. > >> I believe you can find that code here: > >> > >> > https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766 > >> > >> Justine > >> > >> On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax <mj...@apache.org> > wrote: > >> > >>> The _offset_ is the position of the record in the partition. > >>> > >>> The _sequence number_ is a unique ID that allows broker to de-duplicate > >>> messages. It requires the producer to implement the idempotency > protocol > >>> (part of Kafka transactions); thus, sequence numbers are optional and > as > >>> long as you don't want to support idempotent writes, you don't need to > >>> worry about them. (If you want to dig into details, checkout KIP-98 > that > >>> is the original KIP about Kafka TX). > >>> > >>> HTH, > >>> -Matthias > >>> > >>> On 8/1/23 2:19 AM, tison wrote: > >>>> Hi, > >>>> > >>>> I'm wringing a Kafka API Rust codec library[1] to understand how Kafka > >>>> models its concepts and how the core business logic works. > >>>> > >>>> During implementing the codec for Records[2], I saw a twins of fields > >>>> "sequence" and "offset". Both of them are calculated by > >>>> baseOffset/baseSequence + offset delta. Then I'm a bit confused how to > >>> deal > >>>> with them properly - what's the difference between these two concepts > >>>> logically? > >>>> > >>>> Also, to understand how the core business logic works, I write a > simple > >>>> server based on my codec library, and observe that the server may need > >> to > >>>> update offset for records produced. How does Kafka set the correct > >> offset > >>>> for each produced records? And how does Kafka maintain the calculation > >>> for > >>>> offset and sequence during these modifications? > >>>> > >>>> I'll appreciate if anyone can answer the question or give some > insights > >>> :D > >>>> > >>>> Best, > >>>> tison. > >>>> > >>>> [1] https://github.com/tisonkun/kafka-api > >>>> [2] https://kafka.apache.org/documentation/#messageformat > >>>> > >>> > >> > > >