[Structured Streaming] failOnDataLoss "false alarm" with Kafka retention policy

Christian Winkler Wed, 03 Sep 2025 01:07:44 -0700

Hello everyone,

I'm working with Spark Structured Streaming (3.5.5) consuming from Kafka and 
encountered a scenario with `failOnDataLoss` that I'd like to clarify with the 
community.


The documentation for `failOnDataLoss` states:
"[...] This may be a false alarm. You can disable it when it doesn't work as 
you expected."

However, I'm unsure whether my specific scenario qualifies as a "false alarm".

Here's the sequence of events I'm observing:

- Kafka topic contains messages with offsets 1 to 'n'
- Spark Streaming application successfully consumes all 'n' messages
- Streaming checkpoints show all 'n' messages as commited (offsets 1-n)
- Topic retention policy kicks in and purges messages 1-n (this is expected 
behaviour)
- Topic receives a new message at offset n+1
- When Spark constructs a new batch to process message n+1, it fails with 
potential data loss error because apparently it attempts to fetch messages from 
the previous batch starting at offset 'n' (which obviously no longer exists due 
to retention)

I'd appreciate any insights from the community, especially regarding whether 
this falls under the "false alarm" category mentioned in the docs.

Thank you!


Best regards,
Christian

[Structured Streaming] failOnDataLoss "false alarm" with Kafka retention policy

Reply via email to