回复：Small checkpoint data takes too much time

Zhijiang(wangzhijiang999) Tue, 09 Oct 2018 22:26:45 -0700

The checkpoint duration includes the processes of barrier alignment and state 
snapshot. Every task has to receive all the barriers from all the channels, 
then trriger to snapshot state.
I guess the barrier alignment may take long time for your case, and it is 
specially critical during backpressure. You can check the metric of 
"checkpointAlignmentTime" for confirmation.


Best,
Zhijiang
------------------------------------------------------------------
发件人：徐涛 <happydexu...@gmail.com>
发送时间：2018年10月10日(星期三) 13:13
收件人：user <user@flink.apache.org>
主　题：Small checkpoint data takes too much time

Hi 
 I recently encounter a problem in production. I found checkpoint takes too 
much time, although it doesn`t affect the job execution.
 I am using FsStateBackend, writing the data to a HDFS checkpointDataUri, and 
asynchronousSnapshots, I print the metric data “lastCheckpointDuration” and 
“lastCheckpointSize”. It shows the “lastCheckpointSize” is about 80KB, but the 
“lastCheckpointDuration” is about 160s! Because checkpoint data is small , I 
think it should not take that long time. I do not know why and which condition 
may influent the checkpoint time. Does anyone has encounter such problem?
 Thanks a lot.

Best
Henry

回复：Small checkpoint data takes too much time

Reply via email to