Re: instable checkpointing after migration to flink 1.8 (production issue)

2019-07-18 Thread Congxian Qiu
Hi, Bekir First, The e2e time for a sub task is the $ack_time_received_in_JM - $trigger_time_in_JM. And checkpoint includes some steps on task side such as 1) receive first barrier; 2) barrier align(for exactly once); 3) operator snapshot sync part; 4) operator snapshot async part, the images you

Re: instable checkpointing after migration to flink 1.8 (production issue)

2019-07-18 Thread Bekir Oguz
Hi Congxian, Starting from this morning we have more issues with checkpointing in production. What we see is sync and async duration for some subtasks are very long but what strange is the total of sync and async durations are much less than the total end to end duration. Please check the follow