Re: Small checkpoint data takes too much time

徐涛 Thu, 11 Oct 2018 04:22:56 -0700

Hi Zhijiang,
        Thanks for your response.
        I add the checkpointAlignmentTime, the data shows that the 
checkpointDuration is about 150s, and the checkpointAlignmentTims is about 4s. 
There is a big gap between them.


Best
Henry

> 在 2018年10月10日，下午1:26，Zhijiang(wangzhijiang999) <wangzhijiang...@aliyun.com> 
> 写道：
> 
> The checkpoint duration includes the processes of barrier alignment and state 
> snapshot. Every task has to receive all the barriers from all the channels, 
> then trriger to snapshot state.
> I guess the barrier alignment may take long time for your case, and it is 
> specially critical during backpressure. You can check the metric of 
> "checkpointAlignmentTime" for confirmation.
> 
> Best,
> Zhijiang
> ------------------------------------------------------------------
> 发件人：徐涛 <happydexu...@gmail.com>
> 发送时间：2018年10月10日(星期三) 13:13
> 收件人：user <user@flink.apache.org>
> 主　题：Small checkpoint data takes too much time
> 
> Hi 
>  I recently encounter a problem in production. I found checkpoint takes too 
> much time, although it doesn`t affect the job execution.
>  I am using FsStateBackend, writing the data to a HDFS checkpointDataUri, and 
> asynchronousSnapshots, I print the metric data “lastCheckpointDuration” and 
> “lastCheckpointSize”. It shows the “lastCheckpointSize” is about 80KB, but 
> the “lastCheckpointDuration” is about 160s! Because checkpoint data is small 
> , I think it should not take that long time. I do not know why and which 
> condition may influent the checkpoint time. Does anyone has encounter such 
> problem?
>  Thanks a lot.
> 
> Best
> Henry
>

Re: Small checkpoint data takes too much time

Reply via email to