Re: Making job fail on Checkpoint Expired?

Congxian Qiu Fri, 03 Apr 2020 02:18:03 -0700

Currently, only checkpoint declined will be counted into
`continuousFailureCounter`.
Could you please share why do you want the job to fail when checkpoint
expired?


Best,
Congxian


Timo Walther <twal...@apache.org> 于2020年4月2日周四 下午11:23写道：

> Hi Robin,
>
> this is a very good observation and maybe even unintended behavior.
> Maybe Arvid in CC is more familiar with the checkpointing?
>
> Regards,
> Timo
>
>
> On 02.04.20 15:37, Robin Cassan wrote:
> > Hi all,
> >
> > I am wondering if there is a way to make a flink job fail (not cancel
> > it) when one or several checkpoints have failed due to being expired
> > (taking longer than the timeout) ?
> > I am using Flink 1.9.2 and have set
> > `*setTolerableCheckpointFailureNumber(1)*` which doesn't do the trick.
> > Looking into the CheckpointFailureManager.java class, it looks like this
> > only works when the checkpoint failure reason is
> > `*CHECKPOINT_DECLINED*`, but the number of failures isn't incremented on
> > `*CHECKPOINT_EXPIRED*`.
> > Am I missing something?
> >
> > Thanks!
>
>

Re: Making job fail on Checkpoint Expired?

Reply via email to