[ 
https://issues.apache.org/jira/browse/FLINK-37319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928904#comment-17928904
 ] 

Zhenqiu Huang commented on FLINK-37319:
---------------------------------------

I agree. It is an SLA given by cloud provider. I think we can add a config for 
retry delay. It is by default -1, which means no retry. For our internal use 
case, we can set it to 90 seconds. In this case, the checkpoint completion time 
will add 90 seconds for all of jobs. If the checkpoint internval is too short, 
it could cause failure. 


> Add retry in RocksDBStateUploader for fault tolerant
> ----------------------------------------------------
>
>                 Key: FLINK-37319
>                 URL: https://issues.apache.org/jira/browse/FLINK-37319
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.20.0, 1.20.1
>            Reporter: Zhenqiu Huang
>            Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to