Hi Dan,

I think you could see the detail of the checkpoints via the checkpoint UI[1]. 
Also, if you see in the
pending checkpoints some tasks do not take snapshot,  you might have a look 
whether this task
is backpressuring the previous tasks [2].

Best,
Yun



[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/ops/monitoring/checkpoint_monitoring.html
[2] 
https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/monitoring/back_pressure.html
------------------------------------------------------------------
Sender:Dan Hill<quietgol...@gmail.com>
Date:2021/03/02 04:34:56
Recipient:user<user@flink.apache.org>
Theme:Debugging long Flink checkpoint durations

Hi.  Are there good ways to debug long Flink checkpoint durations?

I'm running a backfill job that runs ~10 days of data and then starts 
checkpointing failing.  Since I only see the last 10 checkpoints in the 
jobmaster UI, I don't see when it starts.

I looked through the text logs and didn't see much.

I assume:
1) I have something misconfigured that is causing old state is sticking around.
2) I don't have enough resources.

Reply via email to