【Appearance】For jobs with the same configuration (checkpoint interval: 3
minutes, job logic: regular join), flink1.9 runs normally. After flink1.12
runs for a period of time, the checkpoint creation time increases, and
finally the checkpoint creation fails.

【Analysis】After learning flink1.10, the checkpoint mechanism is adjusted.
The receiver will not cache the data after a single barrier arrives when the
barrier is aligned, which means that the sender must wait for credit
feedback to transmit data after the barrier is aligned, so the sender will
generate certain The cold start of Flink affects the delay and network
throughput. Therefore, the checkpoint interval is adjusted to 10 minutes for
comparative testing, and it is found that after the adjustment (interval is
10), the job running on flink 1.12 is running normally.

issue:https://issues.apache.org/jira/browse/FLINK-16404

【Question】1.Have you encountered the same problem?
                   2.Can  flink1.12 set small checkpoint interval?

The checkpoint interval is 3 minutes after the flink1.12 job runs for 5
hours, the checkpoint creation fails, the specific exception stack:
org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable
failure threshold.

        at
org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleCheckpointException(CheckpointFailureManager.java:96)

        at
org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:65)

        at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1924)

        at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1897)

        at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:93)

        at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator$CheckpointCanceller.run(CheckpointCoordinator.java:2038)

        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.run(FutureTask.java:266)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)

        at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to