Hi Zhijiang,
        Thanks for your response.
        I add the checkpointAlignmentTime, the data shows that the 
checkpointDuration is about 150s, and the checkpointAlignmentTims is about 4s. 
There is a big gap between them.

Best
Henry

> 在 2018年10月10日,下午1:26,Zhijiang(wangzhijiang999) <wangzhijiang...@aliyun.com> 
> 写道:
> 
> The checkpoint duration includes the processes of barrier alignment and state 
> snapshot. Every task has to receive all the barriers from all the channels, 
> then trriger to snapshot state.
> I guess the barrier alignment may take long time for your case, and it is 
> specially critical during backpressure. You can check the metric of 
> "checkpointAlignmentTime" for confirmation.
> 
> Best,
> Zhijiang
> ------------------------------------------------------------------
> 发件人:徐涛 <happydexu...@gmail.com>
> 发送时间:2018年10月10日(星期三) 13:13
> 收件人:user <user@flink.apache.org>
> 主 题:Small checkpoint data takes too much time
> 
> Hi 
>  I recently encounter a problem in production. I found checkpoint takes too 
> much time, although it doesn`t affect the job execution.
>  I am using FsStateBackend, writing the data to a HDFS checkpointDataUri, and 
> asynchronousSnapshots, I print the metric data “lastCheckpointDuration” and 
> “lastCheckpointSize”. It shows the “lastCheckpointSize” is about 80KB, but 
> the “lastCheckpointDuration” is about 160s! Because checkpoint data is small 
> , I think it should not take that long time. I do not know why and which 
> condition may influent the checkpoint time. Does anyone has encounter such 
> problem?
>  Thanks a lot.
> 
> Best
> Henry
> 

Reply via email to