Hi, so the problem is about checkpoints. We need to understand why there are checkpoint failure. Can you provide more logs. We need to check the log to see more details about the first failed checkpoint.
> On 17 Mar 2022, at 9:41 AM, Vijayendra Yadav <contact....@gmail.com> wrote: > > > Hi Flink Team, > > I am using Flink 1.11 with kinsisesis consumer and s3 file streaming write > with s3 checkpoint backend. This is streaming service. > > Usually a couple of checkpoints fails but no issues, After a week or so of > running checkpoint failures becomes ir·re·cov·er·a·ble and although the > application keeps running but in bad state and data flow blocks. > > Refer Graph below: > <image.png> > > Flink Checkpoint configurations as below: > Note: Time units in Milliseconds > flink.checkpoint.interval=10000 > flink.checkpoint.minPauseInterval=500 > flink.checkpoint.Timeout=10000 > flink.checkpoint.maxConcurrent=1 > flink.checkpoint.preferCheckPoint=true > > > kinesis.shard.getrecords.max=10000 > kinesis.shard.getrecords.interval=10000 > kinesis.initial.position=LATEST > > > EXCEPTION On Job: > > org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable > failure threshold. > at > org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:66) > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1626) > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1603) > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:90) > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:1736) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > > > > >