Hi everyone, I'm currently working with Spark Structured Streaming integrated with Kafka and had some questions regarding the failOnDataLoss option.
The current documentation states: *"Whether to fail the query when it's possible that data is lost (e.g., topics are deleted, or offsets are out of range). This may be a false alarm. You can disable it when it doesn't work as you expected."*ChatGPT has some explanation - but I would like to get a more detailed and certain answer, and I think that the documentation should have that explanation as well. I’d appreciate some clarification on the following points: 1. What exactly does “this may be a false alarm” mean in this context? Under what circumstances would that occur? What should I expect when that happens? 2. What does it mean to “fail the query”? Does this imply that the process will skip the problematic offset and continue, or does it stop entirely? How will the next offset get determined? What will happen upon restart? 3. If the offset is out of range, how does Spark determine the next offset to use? Would it default to latest, earliest, or something else? Understanding the expected behavior here would really help us configure this option appropriately for our use case. Thanks in advance for your help! Best regards, Nimrod