Forgot to cc Kostas On 23/04/2020 12:11, Eyal Pe'er wrote: > > Hi all, > I am using Flink streaming with Kafka consumer connector > (FlinkKafkaConsumer) and file Sink (StreamingFileSink) in a cluster > mode with exactly once policy. > > The file sink writes the files to the local disk. > > I’ve noticed that if a job fails and automatic restart is on, the task > managers look for the leftovers files from the last failing job > (hidden files). > > Obviously, since the tasks can be assigned to different task managers, > this sums up to more failures over and over again. > > The only solution I found so far is to delete the hidden files and > resubmit the job. > > If I get it right (and please correct me If I wrong), the events in > the hidden files were not committed to the bootstrap-server, so there > is no data loss. > > > > Is there a way, forcing Flink to ignore the files that were written > already? Or maybe there is a better way to implement the solution > (maybe somehow with savepoints)? > > > > Best regards > > Eyal Peer > > >
signature.asc
Description: OpenPGP digital signature