Debugging long Flink checkpoint durations

Dan Hill Mon, 01 Mar 2021 12:35:21 -0800

Hi.  Are there good ways to debug long Flink checkpoint durations?

I'm running a backfill job that runs ~10 days of data and then starts
checkpointing failing.  Since I only see the last 10 checkpoints in the
jobmaster UI, I don't see when it starts.


I looked through the text logs and didn't see much.

I assume:
1) I have something misconfigured that is causing old state is sticking
around.
2) I don't have enough resources.

Debugging long Flink checkpoint durations

Reply via email to