Hi, Madan.
5s may be too small for checkpoint timeout configuration.
I see the timeout is related to back pressure as you said. You may also
find the metric of "start delay" in 1.14 is longer than one in 1.9.
I'd like to suggest that we increase the configuration of checkpoint
timeout and compare t
Hi Madan,
Maybe you can check the value of "
*execution.checkpointing.tolerable-failed-checkpoints"*[1] in your
application configuration, and try to increase this value?
[1]
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#execution-checkpointing-tolerable-fail
Hi, Madan.
I think there is a root cause of the exception, could you share it ?
BTW, If you don't set a value for
execution.checkpointing.tolerable-failed-checkpoints, I'd recommend you
to set it which could avoid job restart due to some recoverable temporary
problems.
[1]
https://nightlies.apache
Hi!
You need to look into the root cause of checkpoint failure. You can see the
"Checkpoint" tab to see if checkpointing timeout occurs or see the
"Exception" tab for exception messages other than this one. You can also
dive into the logs for suspicious information.
If checkpoint failures are rar