I was reading through about kafka , the way leader election works , log
truncation etc. One thing that kind of struck me was how records which were
written to log but then were not committed (It has not propagated
successfully through to all of the isr and and the high watermark has not
increased and so not committed ) ,were truncated following the replication
reconciliation logic . In case they are not committed they would not be
available for the consumer since the reads are  only upto to the high
watermark. the producer client will also be notified or will eventually
know if the message has not successfully propagated and it should be
handled thru application logic. It seems straight forward in this case.

KIP-405
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage>
talks about tiered storage and kafka being an important part of and an
entry point for data infrastructure . Else where i have read that kafka
also serves as way of replaying data to restore state / viewing data.
KIP-320
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation>
mentions users wanting higher availability opting for unclean leader
election.

Would it be fair to assume that users might be interested in a feature or
at least  one that can be user enabled where a write to kafka (even with a
0 or no acks configuration or unlcean leader election ) will remain written
until the event where clean or delete config is acted upon?.

If this is a valid use case , i have thoughts of suggesting a kip around
picking up the data that is to be truncated at time of truncation and
replaying it as if it came through a fresh produce request. That is a
truncation of data will not result in the data being removed from kafka but
rather be placed differently at a different offset.

Regards,
Vinoth

Reply via email to