Hi Zhijiang, Thanks for your response. I add the checkpointAlignmentTime, the data shows that the checkpointDuration is about 150s, and the checkpointAlignmentTims is about 4s. There is a big gap between them.
Best Henry > 在 2018年10月10日,下午1:26,Zhijiang(wangzhijiang999) <wangzhijiang...@aliyun.com> > 写道: > > The checkpoint duration includes the processes of barrier alignment and state > snapshot. Every task has to receive all the barriers from all the channels, > then trriger to snapshot state. > I guess the barrier alignment may take long time for your case, and it is > specially critical during backpressure. You can check the metric of > "checkpointAlignmentTime" for confirmation. > > Best, > Zhijiang > ------------------------------------------------------------------ > 发件人:徐涛 <happydexu...@gmail.com> > 发送时间:2018年10月10日(星期三) 13:13 > 收件人:user <user@flink.apache.org> > 主 题:Small checkpoint data takes too much time > > Hi > I recently encounter a problem in production. I found checkpoint takes too > much time, although it doesn`t affect the job execution. > I am using FsStateBackend, writing the data to a HDFS checkpointDataUri, and > asynchronousSnapshots, I print the metric data “lastCheckpointDuration” and > “lastCheckpointSize”. It shows the “lastCheckpointSize” is about 80KB, but > the “lastCheckpointDuration” is about 160s! Because checkpoint data is small > , I think it should not take that long time. I do not know why and which > condition may influent the checkpoint time. Does anyone has encounter such > problem? > Thanks a lot. > > Best > Henry >