Re: DuplicateJobSubmissionException on restart after taskmanagers crash

2023-01-21 Thread Gyula Fóra
Hi Javier, I will try to look into this as I have not personally seen this problem while using the operator . It would be great if you could reach out to me on slack or email directly so we can discuss the issue and get to the bottom of it. Cheer Gyula On Fri, 20 Jan 2023 at 23:53, Javier Vegas

Re: DuplicateJobSubmissionException on restart after taskmanagers crash

2023-01-20 Thread Javier Vegas
My issue is described in https://issues.apache.org/jira/browse/FLINK-21928 where it says was fixed in 1.14, but I am still seeing the problem. Although there it says: "Additionally, it is still required that the user cleans up the corresponding HA entries for the running jobs registry because thes

DuplicateJobSubmissionException on restart after taskmanagers crash

2023-01-20 Thread Javier Vegas
I have a Flink app (Flink 1.16.0, deployed to Kubernetes via operator 1.3.1 and using Kubernetes HighAvailaibilty with storage in S3) that depends on multiple Thrift services for data queries. When one of those services is down (or throws exceptions) the Flink job managers end up crashing and only