[jira] [Commented] (FLINK-22805) Dynamic configuration of Flink checkpoint interval

Jiayi Liao (Jira) Sun, 30 May 2021 19:52:07 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-22805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354188#comment-17354188
 ]


Jiayi Liao commented on FLINK-22805:
------------------------------------

This is a good point. But I think the root problem is that, the periodic 
scheduler for checkpoint in {{CheckpointCoordinator}} is too simple to satisfy 
different scenarios. There're several scenarios we've met that the periodic 
scheduler cannot satisfy: 

* Transfer data from Kafka to Hive's partition table, user usually wants the 
checkpoint happens as soon as possible when a Hive's partition is finished. 
* Different interval and timeout for different traffic. From user's 
perspective, what they care about is how much data they need to backtrack if 
the job fails, which means shorter interval on heavy traffic and longer 
interval on light traffic. 

We abstract a {{CheckpointScheduler}} in {{CheckpointCoordinator}} at 
Bytedance, to be responsible for the scheduling of the checkpoint, which can 
also be extended by users. 

> Dynamic configuration of Flink checkpoint interval
> --------------------------------------------------
>
>                 Key: FLINK-22805
>                 URL: https://issues.apache.org/jira/browse/FLINK-22805
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.13.1
>            Reporter: Fu Kai
>            Priority: Critical
>             Fix For: 1.14.0
>
>
> Flink currently does not support dynamic configuration of checkpoint interval 
> on the fly. It's useful for use cases like backfill/cold-start from a stream 
> containing whole history.
>  
> In the cold-start phase, resources are fully utilized and the back-pressure 
> is high for all upstream operators, causing the checkpoint timeout 
> constantly. The real production traffic is far less than that and the 
> provisioned resource is capable of handling it. 
>  
> With the dynamic checkpoint interval configuration, the cold-start process 
> can be speeded up with less frequent checkpoint interval or even turned off. 
> After the process is completed, the checkpoint interval can be updated to 
> normal.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-22805) Dynamic configuration of Flink checkpoint interval

Reply via email to