Dear flink community, We are POC flink(1.8) to process data in real time, and using global checkpointing(S3) and local checkpointing(EBS), deploy cluster on EKS. Our application is consuming data from Kinesis.
For my test e.g I am using checkpointing interval 5min. and minimum pause 2min. The issue what we saw is: It seems like flink checkpointing process would be idle for 3-4 min, before job manager get complete notification. here is some logging from job manager: 2019-07-10 11:59:03,893 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 4 @ 1562759941082 for job e7a97014f5799458f1c656135712813d. 2019-07-10 12:05:01,836 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed checkpoint 4 for job e7a97014f5799458f1c656135712813d (22387207650 bytes in 58645 ms). As my understanding the logging above, the completedCheckpoint(CheckpointCoordinator) object has been completed in 58645 ms, but the whole checkpointing process took ~ 6min. This logging is for 4th checkpointing, But the first 3 checkpointing were finished on time. Could you please tell me, why flink checkpointing in my test was starting "idle" for few minutes after 3 checkpointing? Best Regards -- Xiangyu Su Java Developer xian...@smaato.com Smaato Inc. San Francisco - New York - Hamburg - Singapore www.smaato.com Germany: Valentinskamp 70, Emporio, 19th Floor 20355 Hamburg M 0049(176)22943076 The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.