Re: [Question] check if pipeline is still running in pipeline runner

2023-07-10 Thread Jan Lukavský
Hi, when JM goes down, it should be brought up (if configured as HA, running on k8s, ...), and it should recover all running jobs. If this does not happen then it means that:  a) either the JM is not in HA configuration, or  b) it is unable to recover after failure, which typically means tha

Re: [Question] check if pipeline is still running in pipeline runner

2023-07-07 Thread Lydian
I am using the lyft flink operator (in k8s), and it is able to monitor the submitted job status for us. It shows both cluster and job healthiness. The issue so far we’ve seen is sometimes the task keep failing and retrying, but it was not detected by the flink operator. However, the flink itself co

Re: [Question] check if pipeline is still running in pipeline runner

2023-07-07 Thread Adlae D'Orazio
Hi Jan, Thank you for your response! Apologies that this wasn't clear, but we're actually looking at what would happen if the job server *were *to go down. So what we are more interested in is understanding *how* to actually monitor that the job is running. We won't know the job id so we can't use

Re: [Question] check if pipeline is still running in pipeline runner

2023-07-07 Thread Jan Lukavský
Hi, if I understand correctly, you have a 'program runner' (sometimes called a driver), which is supposed to be long-running and watching if the submitted Pipeline runs or not. If not, then the driver resubmits the job. If my understanding is correct, I would suggest looking into the reasons