[ https://issues.apache.org/jira/browse/FLINK-22805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354188#comment-17354188 ]
Jiayi Liao commented on FLINK-22805: ------------------------------------ This is a good point. But I think the root problem is that, the periodic scheduler for checkpoint in {{CheckpointCoordinator}} is too simple to satisfy different scenarios. There're several scenarios we've met that the periodic scheduler cannot satisfy: * Transfer data from Kafka to Hive's partition table, user usually wants the checkpoint happens as soon as possible when a Hive's partition is finished. * Different interval and timeout for different traffic. From user's perspective, what they care about is how much data they need to backtrack if the job fails, which means shorter interval on heavy traffic and longer interval on light traffic. We abstract a {{CheckpointScheduler}} in {{CheckpointCoordinator}} at Bytedance, to be responsible for the scheduling of the checkpoint, which can also be extended by users. > Dynamic configuration of Flink checkpoint interval > -------------------------------------------------- > > Key: FLINK-22805 > URL: https://issues.apache.org/jira/browse/FLINK-22805 > Project: Flink > Issue Type: New Feature > Components: Runtime / Checkpointing > Affects Versions: 1.13.1 > Reporter: Fu Kai > Priority: Critical > Fix For: 1.14.0 > > > Flink currently does not support dynamic configuration of checkpoint interval > on the fly. It's useful for use cases like backfill/cold-start from a stream > containing whole history. > > In the cold-start phase, resources are fully utilized and the back-pressure > is high for all upstream operators, causing the checkpoint timeout > constantly. The real production traffic is far less than that and the > provisioned resource is capable of handling it. > > With the dynamic checkpoint interval configuration, the cold-start process > can be speeded up with less frequent checkpoint interval or even turned off. > After the process is completed, the checkpoint interval can be updated to > normal. > -- This message was sent by Atlassian Jira (v8.3.4#803005)