Hi Madan, Maybe you can check the value of " *execution.checkpointing.tolerable-failed-checkpoints"*[1] in your application configuration, and try to increase this value?
[1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/config/#execution-checkpointing-tolerable-failed-checkpoints Madan D via user <user@flink.apache.org> 于2022年12月8日周四 11:40写道: > Hello All, > I am seeing below issue after I upgraded from 1.9.0 to 1.14.2 while > publishing messages to pub sub which is causing frequent job restart and > slow processing. > > Can you please help me. > > `Caused by: org.apache.flink.util.FlinkRuntimeException: Exceeded > checkpoint tolerable failure threshold. > at > org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleCheckpointException(CheckpointFailureManager.java:98) > at > org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:67) > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1940) > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1912) > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:98) > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:1996) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecut > > Regards, > Madan > > > -- Best, Yanfei