Currently, only checkpoint declined will be counted into `continuousFailureCounter`. Could you please share why do you want the job to fail when checkpoint expired?
Best, Congxian Timo Walther <twal...@apache.org> 于2020年4月2日周四 下午11:23写道: > Hi Robin, > > this is a very good observation and maybe even unintended behavior. > Maybe Arvid in CC is more familiar with the checkpointing? > > Regards, > Timo > > > On 02.04.20 15:37, Robin Cassan wrote: > > Hi all, > > > > I am wondering if there is a way to make a flink job fail (not cancel > > it) when one or several checkpoints have failed due to being expired > > (taking longer than the timeout) ? > > I am using Flink 1.9.2 and have set > > `*setTolerableCheckpointFailureNumber(1)*` which doesn't do the trick. > > Looking into the CheckpointFailureManager.java class, it looks like this > > only works when the checkpoint failure reason is > > `*CHECKPOINT_DECLINED*`, but the number of failures isn't incremented on > > `*CHECKPOINT_EXPIRED*`. > > Am I missing something? > > > > Thanks! > >