Hi!

You need to look into the root cause of checkpoint failure. You can see the
"Checkpoint" tab to see if checkpointing timeout occurs or see the
"Exception" tab for exception messages other than this one. You can also
dive into the logs for suspicious information.

If checkpoint failures are rare and you would like to allow them,
set execution.checkpointing.tolerable-failed-checkpoints to the number of
checkpoints you would like to tolerate. For documentation see
https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#execution-checkpointing-tolerable-failed-checkpoints

Robert Cullen <cinquate...@gmail.com> 于2021年10月8日周五 上午12:49写道:

> I have Flink set up with 2 taskmanagers and one jobmanager. I've allocated
> 25 gb of JVM Heap and 15 gb of  Flink managed memory.  I have 2 jobs
> running.  After 3 hours this exception was thrown.  How can I configure
> flink to prevent this from happening?
>
> 2021-10-07 12:38:50
> org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint
> tolerable failure threshold.
>     at org.apache.flink.runtime.checkpoint.CheckpointFailureManager
> .handleCheckpointException(CheckpointFailureManager.java:98)
>     at org.apache.flink.runtime.checkpoint.CheckpointFailureManager
> .handleJobLevelCheckpointException(CheckpointFailureManager.java:67)
>     at org.apache.flink.runtime.checkpoint.CheckpointCoordinator
> .abortPendingCheckpoint(CheckpointCoordinator.java:1934)
>     at org.apache.flink.runtime.checkpoint.CheckpointCoordinator
> .abortPendingCheckpoint(CheckpointCoordinator.java:1906)
>     at org.apache.flink.runtime.checkpoint.CheckpointCoordinator
> .access$600(CheckpointCoordinator.java:96)
>     at org.apache.flink.runtime.checkpoint.
> CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:
> 1990)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:
> 511)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at java.util.concurrent.
> ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(
> ScheduledThreadPoolExecutor.java:180)
>     at java.util.concurrent.
> ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
> ScheduledThreadPoolExecutor.java:293)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
>
> --
> Robert Cullen
> 240-475-4490
>

Reply via email to