You're probably right Chesnay, I just asked to this mailing list to know if there are any pointers or blog post about this topic from a Flink perspective. Then the conversation has gone in the wrong direction.
Best, Flavio On Tue, Feb 2, 2021 at 12:48 PM Chesnay Schepler <[email protected]> wrote: > I'm sorry, but aren't these question better suited for the Airflow mailing > lists? > > On 2/2/2021 12:35 PM, Flavio Pompermaier wrote: > > Thank you all for the hints. However looking at the REST API[1] of AirFlow > 2.0 I can't find how to setup my DAG (if this is the right concept). > Do I need to first create a Connection? A DAG? a TaskInstance? How do I > specify the 2 BashOperator? > I was thinking to connect to AirFlow via Java so I can't use the Python > API.. > > [1] > https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#section/Overview > > On Tue, Feb 2, 2021 at 10:53 AM Arvid Heise <[email protected]> wrote: > >> Hi Flavio, >> >> If you know a bit of Python, it's also trivial to add a new Flink >> operator where you can use REST API. >> >> In general, I'd consider Airflow to be the best choice for your problem, >> especially if it gets more complicated in the future (do something else if >> the first job fails). >> >> If you have specific questions, feel free to ask. >> >> Best, >> >> Arvid >> >> On Tue, Feb 2, 2021 at 10:08 AM 姜鑫 <[email protected]> wrote: >> >>> Hi Flavio, >>> >>> I probably understand what you need. Apache AirFlow is a scheduling >>> framework which you can define your own dependent operators, therefore you >>> can define a BashOperator to submit flink job to you local flink cluster. >>> For example: >>> ``` >>> t1 = BashOperator( >>> task_id=‘flink-wordcount', >>> bash_command=‘./bin/flink run >>> flink/build-target/examples/batch/WordCount.jar', >>> ... >>> ) >>> ``` >>> Alse Airflow supports submitting jobs to kubernetes and you can even >>> implement your own operator if bash command doesn’t meet your demands. >>> >>> Indeed Flink AI (flink-ai-extended >>> <https://github.com/alibaba/flink-ai-extended> ?) needs an enhanced >>> version of AirFlow, but it is mainly for streaming scenario which means the >>> job won’t stop. In your case which are all batch jobs it doesn’t help much. >>> Hope this helps. >>> >>> Regard, >>> Xin >>> >>> >>> 2021年2月2日 下午4:30,Flavio Pompermaier <[email protected]> 写道: >>> >>> Hi Xin, >>> let me state first that I never used AirFlow so I can probably miss some >>> background here. >>> I just want to externalize the job scheduling to some consolidated >>> framework and from what I see Apache AirFlow is probably what I need. >>> However I can't find any good blog post or documentation about how to >>> integrate these 2 technologies using REST API of both services. >>> I saw that Flink AI decided to use a customized/enhanced version of >>> AirFlow [1] but I didn't look into the code to understand how they use it. >>> In my use case I just want to schedule 2 Flink batch jobs using the REST >>> API of AirFlow, where the second one is fired after the first. >>> >>> [1] >>> https://github.com/alibaba/flink-ai-extended/tree/master/flink-ai-flow >>> >>> Best, >>> Flavio >>> >>> On Tue, Feb 2, 2021 at 2:43 AM 姜鑫 <[email protected]> wrote: >>> >>>> Hi Flavio, >>>> >>>> Could you explain what your direct question is? In my opinion, it is >>>> possible to define two airflow operators to submit dependent flink job, as >>>> long as the first one can reach the end. >>>> >>>> Regards, >>>> Xin >>>> >>>> 2021年2月1日 下午6:43,Flavio Pompermaier <[email protected]> 写道: >>>> >>>> Any advice here? >>>> >>>> On Wed, Jan 27, 2021 at 9:49 PM Flavio Pompermaier < >>>> [email protected]> wrote: >>>> >>>>> Hello everybody, >>>>> is there any suggested way/pointer to schedule Flink jobs using Apache >>>>> AirFlow? >>>>> What I'd like to achieve is the submission (using the REST API of >>>>> AirFlow) of 2 jobs, where the second one can be executed only if the first >>>>> one succeed. >>>>> >>>>> Thanks in advance >>>>> Flavio >>>>> >>>> >>> > >
