Hi,
Yes, your analysis is correct. The pending files are not recognised as such 
because they were never in any checkpointed state that could be restored. I’m 
afraid it’s not possible to build the sink state just from the files existing 
in the output folder. The reason we have state in the first place is so that we 
can figure out what each of the files in the output folder are.

Maybe you could manually move the pending files that you know are correct to 
“final”?

Best,
Aljoscha

> On 28. Apr 2017, at 11:22, Yassine MARZOUGUI <y.marzou...@mindlytix.com> 
> wrote:
> 
> Hi all,
> 
> I'm have a failed job containing a BucketingSink. The last successful 
> checkpoint was before the source started emitting data. The following 
> checkpoints all failed due to the long timeout as I mentioned here : 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html
>  
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html>.
> 
> The Taskmanager has then failed. Upon recovery, the pending fies did not move 
> to finished state. 
> 
> Is that because the sink was not able to checkpoint to list of pending files?
> Is it possible to build the sink state just from the output folder and the 
> suffixes of the files?
> 
> Thanks,
> Yassine

Reply via email to