I have Flink set up with 2 taskmanagers and one jobmanager. I've allocated 25 gb of JVM Heap and 15 gb of Flink managed memory. I have 2 jobs running. After 3 hours this exception was thrown. How can I configure flink to prevent this from happening?
2021-10-07 12:38:50 org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable failure threshold. at org.apache.flink.runtime.checkpoint.CheckpointFailureManager .handleCheckpointException(CheckpointFailureManager.java:98) at org.apache.flink.runtime.checkpoint.CheckpointFailureManager .handleJobLevelCheckpointException(CheckpointFailureManager.java:67) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator .abortPendingCheckpoint(CheckpointCoordinator.java:1934) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator .abortPendingCheckpoint(CheckpointCoordinator.java:1906) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600( CheckpointCoordinator.java:96) at org.apache.flink.runtime.checkpoint. CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java: 1990) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java: 511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask .access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask .run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor .java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624) at java.lang.Thread.run(Thread.java:748) -- Robert Cullen 240-475-4490