Hi Aljoscha, Thank you for your response. I guess then I will manually rename the pending files. Does this however mean that the BucketingSink is not exactly-once as it is described is the docs, since in this case (failure of the job and failure of checkpoints) there will be duplicates? Or am I missing something in the notion of exactly-once guarantees?
Best, Yassine 2017-04-28 15:47 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>: > Hi, > Yes, your analysis is correct. The pending files are not recognised as > such because they were never in any checkpointed state that could be > restored. I’m afraid it’s not possible to build the sink state just from > the files existing in the output folder. The reason we have state in the > first place is so that we can figure out what each of the files in the > output folder are. > > Maybe you could manually move the pending files that you know are correct > to “final”? > > Best, > Aljoscha > > On 28. Apr 2017, at 11:22, Yassine MARZOUGUI <y.marzou...@mindlytix.com> > wrote: > > Hi all, > > I'm have a failed job containing a BucketingSink. The last successful > checkpoint was before the source started emitting data. The following > checkpoints all failed due to the long timeout as I mentioned here : > http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html. > > The Taskmanager has then failed. Upon recovery, the pending fies did not > move to finished state. > > Is that because the sink was not able to checkpoint to list of pending > files? > Is it possible to build the sink state just from the output folder and the > suffixes of the files? > > Thanks, > Yassine > > >