Hi, I think there's currently no option for achieving this on Flink 1.4.x.
Best, Aljoscha > On 15. Feb 2018, at 18:11, Ron Crocker <rcroc...@newrelic.com> wrote: > > Thanks Till and Aljoscha. Are there good options for 1.4? I’d rather not fork > to get this, but I’ll do it if I have to. > > Ron > >> On Feb 14, 2018, at 2:43 AM, Aljoscha Krettek <aljos...@apache.org> wrote: >> >> Hi Ron, >> >> Keep in mind, though, that this feature will only be available with the >> upcoming Flink 1.5. Just making sure you don't go looking for this and are >> surprised if you don't find it. >> >> Best, >> Aljoscha >> >> >>> On 14. Feb 2018, at 10:20, Till Rohrmann <trohrm...@apache.org> wrote: >>> >>> Hi Ron, >>> >>> you should be able to turn off the Task failure in case of a checkpoint >>> failure by setting `ExecutionConfig.setFailTaskOnCheckpointError(false)`. >>> This setting should change the behavior such that checkpoint failures will >>> simply fail the distributed checkpoint. >>> >>> Cheers, >>> Till >>> >>> On Tue, Feb 13, 2018 at 11:41 PM, Ron Crocker <rcroc...@newrelic.com> wrote: >>> >>>> What would it take to be a little more flexible in handling checkpoint >>>> failures? >>>> >>>> Right now I have a team that’s checkpointing into S3, via the >>>> FsStateBackend and an appropriate URL. Sometimes these checkpoints fail. >>>> They’re transient, though, and a retry would likely work. >>>> >>>> However, when they fail, their job exits and restarts from the last >>>> checkpoint. That’s fine, but I’d rather it tried again before failing, and >>>> even after failing just keep running and do another checkpoint. Maybe this >>>> is something that should be configurable - # of retries, failure strategy, >>>> … >>>> >>>> Ron >> >