Hi, Yes, basically all the exactly-once/at-least-once guarantees are not given if checkpointing does not work correctly. For example, this will also be the case when reading from Kafka and writing to Kafka.
Best, Aljoscha > On 28. Apr 2017, at 15:53, Yassine MARZOUGUI <y.marzou...@mindlytix.com> > wrote: > > Hi Aljoscha, > > Thank you for your response. I guess then I will manually rename the pending > files. Does this however mean that the BucketingSink is not exactly-once as > it is described is the docs, since in this case (failure of the job and > failure of checkpoints) there will be duplicates? Or am I missing something > in the notion of exactly-once guarantees? > > Best, > Yassine > > 2017-04-28 15:47 GMT+02:00 Aljoscha Krettek <aljos...@apache.org > <mailto:aljos...@apache.org>>: > Hi, > Yes, your analysis is correct. The pending files are not recognised as such > because they were never in any checkpointed state that could be restored. I’m > afraid it’s not possible to build the sink state just from the files existing > in the output folder. The reason we have state in the first place is so that > we can figure out what each of the files in the output folder are. > > Maybe you could manually move the pending files that you know are correct to > “final”? > > Best, > Aljoscha > >> On 28. Apr 2017, at 11:22, Yassine MARZOUGUI <y.marzou...@mindlytix.com >> <mailto:y.marzou...@mindlytix.com>> wrote: >> >> Hi all, >> >> I'm have a failed job containing a BucketingSink. The last successful >> checkpoint was before the source started emitting data. The following >> checkpoints all failed due to the long timeout as I mentioned here : >> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html >> >> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html>. >> >> The Taskmanager has then failed. Upon recovery, the pending fies did not >> move to finished state. >> >> Is that because the sink was not able to checkpoint to list of pending files? >> Is it possible to build the sink state just from the output folder and the >> suffixes of the files? >> >> Thanks, >> Yassine > >