I think you're right Till, this is the problem.
In fact, I opened a duplicating jira ticket in parallel :)
I hope we can fix it in the next version of 1.12.
Regards,
Roman
On Fri, Jan 15, 2021 at 2:09 PM Till Rohrmann wrote:
> Thanks for reporting and analyzing this issue Kelly. I think you ar
Thanks for reporting and analyzing this issue Kelly. I think you are indeed
running into a Flink bug. I think the problem is the following: With Flink
1.12.0 [1] we introduced a throttling mechanism for discarding checkpoints.
The way it is implemented is that once a checkpoint is discarded it can
Hi folks,
I recently upgraded to Flink 1.12.0 and I’m hitting an issue where my JM is
crashing while cancelling a job. This is causing Kubernetes readiness probes to
fail, the JM to be restarted, and then get in a bad state while it tries to
recover itself using ZK + a checkpoint which no longe