subject:"Re\: Exceeded Checkpoint tolerable failure"

Re: Exceeded Checkpoint tolerable failure

2022-12-11 Thread Hangxiang Yu

Hi, Madan. 5s may be too small for checkpoint timeout configuration. I see the timeout is related to back pressure as you said. You may also find the metric of "start delay" in 1.14 is longer than one in 1.9. I'd like to suggest that we increase the configuration of checkpoint timeout and compare t

Re: Exceeded Checkpoint tolerable failure

2022-12-08 Thread Yanfei Lei

Hi Madan, Maybe you can check the value of " *execution.checkpointing.tolerable-failed-checkpoints"*[1] in your application configuration, and try to increase this value? [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#execution-checkpointing-tolerable-fail

Re: Exceeded Checkpoint tolerable failure

2022-12-08 Thread Hangxiang Yu

Hi, Madan. I think there is a root cause of the exception, could you share it ? BTW, If you don't set a value for execution.checkpointing.tolerable-failed-checkpoints, I'd recommend you to set it which could avoid job restart due to some recoverable temporary problems. [1] https://nightlies.apache

Re: Exceeded Checkpoint tolerable failure threshold Exception

2021-10-07 Thread Caizhi Weng

Hi! You need to look into the root cause of checkpoint failure. You can see the "Checkpoint" tab to see if checkpointing timeout occurs or see the "Exception" tab for exception messages other than this one. You can also dive into the logs for suspicious information. If checkpoint failures are rar