Hey Xiang,

Thanks for your questions! This is getting to the limit of my knowledge,
but I'll answer as best I can.

The partitionLeaderEpoch is only set once during the batch lifetime (during
Produce), and is not mutated any other time. This includes when data is
fetched by other replicas and by consumers, and when partition leadership
changes.
I believe this field is a record of which partitionLeaderEpoch was active
at the time the batch was produced, and can be different for different
batches within a partition as leadership changes. I wouldn't call this
"outdated", as I think there is an intentional use for this historical
leadership data in the log [1].

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-101+-+Alter+Replication+Protocol+to+use+Leader+Epoch+rather+than+High+Watermark+for+Truncation

Thanks,
Greg

On Wed, Oct 23, 2024 at 8:07 PM Xiang Zhang <xiangzhang1...@gmail.com>
wrote:

> Thank you Greg for all the knowledge, some follow up questions.
>
> Does partitionLeaderEpoch always reflect the latest leader election or an
> old epoch can be allowed ? If it is the first case, then I agree
> partitionLeaderEpoch should not be included in CRC computation. But it
> raises some new questions for me, which is which roles will check the
> checksum and under what circumstances? I am asking this because after the
> producing process, any record in the broker log can have an outdated leader
> epoch field once leader election happens, right ? Do they get updated ?
>
> Sorry for all the questions, I have been using Kafka for several years and
> want to dive deep into it a little bit. I have become more interested and
> ready to find out on my own. But still look forward to your thoughts on
> this if the questions above do make some sense.
>
>
> Thanks,
> XIang
>
> Greg Harris <greg.har...@aiven.io.invalid> 于2024年10月24日周四 00:25写道:
>
> > Hi Xiang,
> >
> > Thanks for your question! That sentence is a justification for why the
> > partitionLeaderEpoch field is not included in the CRC.
> >
> > If you mutate fields which are included in a CRC, you need to recompute
> the
> > CRC value. See [1] for mutating the maxTimestamp. Compare that with [2]
> for
> > setting the partitionLeaderEpoch.
> > This makes setting the partitionLeaderEpoch faster than setting the max
> > timestamp. And because setting the partitionLeaderEpoch happens on every
> > Produce request, it was optimized in the protocol design.
> > It does have the tradeoff that corruptions in the partitionLeaderEpoch
> are
> > not detected by the CRC, but someone decided this was worth the
> > optimization to the Produce flow.
> >
> > I don't have more information on why this optimization was made for
> > partitionLeaderEpoch and not maxTimestamp.
> >
> > Hope this helps,
> > Greg
> >
> > [1]
> >
> >
> https://github.com/apache/kafka/blob/2d896d9130f121e75ccba2d913bdffa358cf3867/clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java#L371-L382
> > [2]
> >
> >
> https://github.com/apache/kafka/blob/2d896d9130f121e75ccba2d913bdffa358cf3867/clients/src/main/java/org/apache/kafka/common/record/DefaultRecordBatch.java#L385-L387
> >
> >
> > On Tue, Oct 22, 2024 at 7:51 PM Xiang Zhang <xiangzhang1...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > I am reading official doc here:
> > > https://kafka.apache.org/documentation/#messageformat, and I could not
> > > fully understand it. If someone can clarify it for me, it would be much
> > > appreciated. The sentence is
> > >
> > > The partition leader epoch field is not included in the CRC computation
> > to
> > > avoid the need to recompute the CRC when this field is assigned for
> every
> > > batch that is received by the broker.
> > >
> > > I just don’t really get what the highlight part is trying to say.
> > >
> > > Regards,
> > > Xiang Zhang
> > >
> >
>

Reply via email to