I have followed this https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_migration.html<https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_migration.html#container-cut-off-memory> and I am using taskmanager.memory.flink.size now instead of taskmanager.heap.size ________________________________ From: Deshpande, Omkar <omkar_deshpa...@intuit.com> Sent: Monday, September 14, 2020 6:23 PM To: user@flink.apache.org <user@flink.apache.org> Subject: flink checkpoint timeout
This email is from an external sender. Hello, I recently upgraded from flink 1.9 to 1.10. The checkpointing succeeds first couple of times and then starts failing because of timeouts. The checkpoint time grows with every checkpoint and starts exceeding 10 minutes. I do not see any exceptions in the logs. I have enabled debug logging at "org.apache.flink" level. How do I investigate this? The garbage collection seems fine. There is no backpressure. This used to work as is with flink 1.9 without any issue. Any pointers on how to investigate long time taken to complete checkpoint? Omkar