[ https://issues.apache.org/jira/browse/FLINK-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441180#comment-16441180 ]
ASF GitHub Bot commented on FLINK-4809: --------------------------------------- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/4883#discussion_r182155763 --- Diff: docs/dev/stream/state/checkpointing.md --- @@ -118,6 +120,9 @@ env.getCheckpointConfig.setMinPauseBetweenCheckpoints(500) // checkpoints have to complete within one minute, or are discarded env.getCheckpointConfig.setCheckpointTimeout(60000) +// prevent the tasks from failing if an error happens in their checkpointing, the checkpoint will just be declined. +env.getCheckpointConfig.setFailTasksOnCheckpointingErrors(false) --- End diff -- This line is missing from the Java tab. > Operators should tolerate checkpoint failures > --------------------------------------------- > > Key: FLINK-4809 > URL: https://issues.apache.org/jira/browse/FLINK-4809 > Project: Flink > Issue Type: Sub-task > Components: State Backends, Checkpointing > Reporter: Stephan Ewen > Assignee: Stefan Richter > Priority: Major > Fix For: 1.5.0 > > > Operators should try/catch exceptions in the synchronous and asynchronous > part of the checkpoint and send a {{DeclineCheckpoint}} message as a result. > The decline message should have the failure cause attached to it. > The checkpoint barrier should be sent anyways as a first step before > attempting to make a state checkpoint, to make sure that downstream operators > do not block in alignment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)