Hello everyone, I'm working with Spark Structured Streaming (3.5.5) consuming from Kafka and encountered a scenario with `failOnDataLoss` that I'd like to clarify with the community.
The documentation for `failOnDataLoss` states: "[...] This may be a false alarm. You can disable it when it doesn't work as you expected." However, I'm unsure whether my specific scenario qualifies as a "false alarm". Here's the sequence of events I'm observing: - Kafka topic contains messages with offsets 1 to 'n' - Spark Streaming application successfully consumes all 'n' messages - Streaming checkpoints show all 'n' messages as commited (offsets 1-n) - Topic retention policy kicks in and purges messages 1-n (this is expected behaviour) - Topic receives a new message at offset n+1 - When Spark constructs a new batch to process message n+1, it fails with potential data loss error because apparently it attempts to fetch messages from the previous batch starting at offset 'n' (which obviously no longer exists due to retention) I'd appreciate any insights from the community, especially regarding whether this falls under the "false alarm" category mentioned in the docs. Thank you! Best regards, Christian
