Re: Behaviour of the BucketingSink when checkpoints fail

Yassine MARZOUGUI Fri, 28 Apr 2017 06:53:48 -0700

Hi Aljoscha,

Thank you for your response. I guess then I will manually rename the
pending files. Does this however mean that the BucketingSink is not
exactly-once as it is described is the docs, since in this case (failure of
the job and failure of checkpoints) there will be duplicates? Or am I
missing something in the notion of exactly-once guarantees?


Best,
Yassine

2017-04-28 15:47 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>:

> Hi,
> Yes, your analysis is correct. The pending files are not recognised as
> such because they were never in any checkpointed state that could be
> restored. I’m afraid it’s not possible to build the sink state just from
> the files existing in the output folder. The reason we have state in the
> first place is so that we can figure out what each of the files in the
> output folder are.
>
> Maybe you could manually move the pending files that you know are correct
> to “final”?
>
> Best,
> Aljoscha
>
> On 28. Apr 2017, at 11:22, Yassine MARZOUGUI <y.marzou...@mindlytix.com>
> wrote:
>
> Hi all,
>
> I'm have a failed job containing a BucketingSink. The last successful
> checkpoint was before the source started emitting data. The following
> checkpoints all failed due to the long timeout as I mentioned here :
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html.
>
> The Taskmanager has then failed. Upon recovery, the pending fies did not
> move to finished state.
>
> Is that because the sink was not able to checkpoint to list of pending
> files?
> Is it possible to build the sink state just from the output folder and the
> suffixes of the files?
>
> Thanks,
> Yassine
>
>
>

Re: Behaviour of the BucketingSink when checkpoints fail

Reply via email to