Re: [Question] check if pipeline is still running in pipeline runner

Jan Lukavský Fri, 07 Jul 2023 01:28:16 -0700

Hi,

if I understand correctly, you have a 'program runner' (sometimes calleda driver), which is supposed to be long-running and watching if thesubmitted Pipeline runs or not. If not, then the driver resubmits thejob. If my understanding is correct, I would suggest looking into thereasons why the pipeline terminates in the first place. Flink isdesigned to ensure that after job submission it is fault-tolerant forboth application-level errors (e.g. transient user code errors, externaldependencies failures, etc) and the Flink runtime itself (failures oftaskmanagers or jobmanager). The most often case when this does not workis some sort of misconfiguration (typically inability to restore jobsafter failure of jobmanager). Having said that it is good idea to_monitor_ that your job runs (and ideally that it makes progress,because the pure fact that job 'runs' does not imply that), but itshould require manual action in case the job is permanently gone. Simpleresubmission of the job is not what I would expect to work well.


Best,

 Jan

On 7/6/23 22:07, Adlae D'Orazio wrote:

Hello,
I am using an Apache Flink cluster to run a streaming pipeline thatI've created using Apache Beam. This streaming pipeline should be theonly one of its type running on the Flink cluster, and I need somehelp with how to ensure that is the case.
A Dockerized pipeline runner program submits the streaming pipeline,and if the pipeline exits (i.e. because of an error), then thepipeline runner program exits and is re-run, so that the pipeline issubmitted again and continues running.
The problem I am running into is that if the pipeline runner programexits, but the streaming pipeline is still running (i.e. because thejob server went down and came back up), then I need to check in thepipeline runner program whether or not the pipeline is still running,or if it has gone down.
My first thought was to try to create a specific job name that wouldbe stored in Flink's REST API, and then to see if the job was alreadyrunning, I could query the REST API for that name. I'm having troubledoing this. I seem to be able to set a job name in Beam, but that jobname does not seem to be accessible via Flink’s REST API once thepipeline is run using Flink. From researching this problem, I foundthis<https://github.com/apache/beam/blob/9a11e28ce79e3b243a13fbf148f2ba26b8c14107/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptions.java#L340>method,which initializes an AppName. This seems promising to me, but it iswritten in Java and I am looking to do it in Python.
Is there a way to specify the Flink job name via the Beam Python SDK?Or is there a simpler way to know that a particular Beam pipeline isrunning, and therefore not resubmit it?
Please let me know if you have any suggestions - either about how toexecute the approaches I've described or if there's a simpler solutionthat I am overlooking. Thank you for your help!
Best,

Adlae D'Orazio

Re: [Question] check if pipeline is still running in pipeline runner

Reply via email to