Hi all, I am using Flink streaming with Kafka consumer connector (FlinkKafkaConsumer) and file Sink (StreamingFileSink) in a cluster mode with exactly once policy. The file sink writes the files to the local disk. I've noticed that if a job fails and automatic restart is on, the task managers look for the leftovers files from the last failing job (hidden files). Obviously, since the tasks can be assigned to different task managers, this sums up to more failures over and over again. The only solution I found so far is to delete the hidden files and resubmit the job. If I get it right (and please correct me If I wrong), the events in the hidden files were not committed to the bootstrap-server, so there is no data loss.
Is there a way, forcing Flink to ignore the files that were written already? Or maybe there is a better way to implement the solution (maybe somehow with savepoints)? Best regards Eyal Peer