Hi, I think the behaviour of min_pause_between_checkpoints is either buggy or we should at least discuss if it would not be better to respect a pause also for failed checkpoints. As far as I know there is no ongoing work to add backoff, so I suggest you open a jira issue and make a case for this.
Best, Stefan > Am 08.06.2018 um 06:30 schrieb vipul singh <[email protected]>: > > Hello all, > > Are there any recommendations on using a backoff when experiencing > checkpointing failures? > What we have seen is when a checkpoint starts to expire, the next checkpoint > dosent care about the previous failure, and starts soon after. We > experimented with min_pause_between_checkpoints, however that seems only to > work for successful checkpoints( the same is discussed on this thread > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/minPauseBetweenCheckpoints-for-failed-checkpoints-td20152.html>) > > Are there any recommendations on how to have a backoff or is there something > in works to add a backoff incase of checkpointing failures? This seems very > valuable incase of checkpointing on an external location like s3, where one > can be potentially throttled or gets errors like TooBusyException from s3(for > example like in this jira <https://issues.apache.org/jira/browse/FLINK-9061>) > > Please let us know! > Thanks, > Vipul
