Hi everyone,
I'm currently working with Spark Structured Streaming integrated with Kafka
and had some questions regarding the failOnDataLoss option.
The current documentation states:
*"Whether to fail the query when it's possible that data is lost (e.g.,
topics are deleted, or offsets are out of
I use this option in development environments where jobs are not actively
running and Kafka topic has retention policy on. Meaning when a streaming
job runs it may find that the last offset it read is not there anymore and
in this case it falls back to starting position (i.e. earliest or latest)
sp