subject:"Flink k8s operator unstable deployment"

Re: Flink k8s operator unstable deployment

2024-08-31 Thread Naci Simsek

Hi Arthur,In your initial mail, it was seen an explicit job id set:$internal.pipeline.job-id, 044d28b712536c1d1feed3475f2b8111This might be the reason of duplicatedJobSubmission exception.In the job config on your last reply, I could not see such setting. You could verify from the JM logs that when

Re: Flink k8s operator unstable deployment

2024-08-30 Thread Arthur Catrisse via user

Hi Naci, Thanks for your answer. We do not explicitly define the job-id. As we are using the flink-kubernetes-operator, I suppose it's the operator handling this ID. The job is defined in the FlinkDeployment charts, where we have specs for jobmanager, taskmanager and the job : job: jarURI: local:/

Re: Flink k8s operator unstable deployment

2024-08-28 Thread Naci Simsek

Hi Arthur,How you submit your job? Are you explicitly setting job id when submitting the job?Have you also tried without HA to see the behavior?Looks like the job is submitted with the same ID with the previous job, which the job result stored in HA does not let you submit it with the same job_id.B

Flink k8s operator unstable deployment

2024-08-28 Thread Arthur Catrisse via user

Hello, We are running into issues when deploying flink on kubernetes using the flink-kubernetes-operator with a FlinkDeployment Occasionally, when a *JobManager* gets rotated out (by karpenter in our case), the next JobManager is incapable of getting into a stable state and is stuck in a crash loo