Hi Congxian, Starting from this morning we have more issues with checkpointing in production. What we see is sync and async duration for some subtasks are very long but what strange is the total of sync and async durations are much less than the total end to end duration. Please check the following snapshot:
For example, for the subtask 14: Sync duration is 4 mins, async duration 3 mins, end-to-end duration is 53 mins!!! We have a very long timeout value (1 hour) for checkpointing, but still many checkpoints are failing, some subtasks cannot finish checkpointing in 1 hour. We really appreciate your help here, this is a critical production problem for us at the moment. Regards, Bekir > On 17 Jul 2019, at 17:46, Bekir Oguz <bekir.o...@persgroep.net> wrote: > > > And I also extracted events fr