Clarification on failOnDataLoss Behavior in Spark Structured Streaming with Kafka

2025-07-10 Thread Nimrod Ofek
Hi everyone, I'm currently working with Spark Structured Streaming integrated with Kafka and had some questions regarding the failOnDataLoss option. The current documentation states: *"Whether to fail the query when it's possible that data is lost (e.g., topics are deleted, or offsets are out of

Re: Clarification on failOnDataLoss Behavior in Spark Structured Streaming with Kafka

2025-07-10 Thread Khalid Mammadov
I use this option in development environments where jobs are not actively running and Kafka topic has retention policy on. Meaning when a streaming job runs it may find that the last offset it read is not there anymore and in this case it falls back to starting position (i.e. earliest or latest) sp