Re: Pipeline manager/scheduler frameworks

2019-02-08 Thread Adeel Ahmad
Airflow would be good but you will probably have to modify it to support stream processing. Any DAG based manager would be useful in your case. Luigi works too, but airflow has a sleeker UI. You could also try streamsets. GCP provides composer which uses airflow and dataflow for beam. AWS has Glue

Re: Pipeline manager/scheduler frameworks

2019-02-08 Thread Rui Wang
Apache Airflow is a scheduling system that can help manage data pipelines. I have seen Airflow is used to manage a few thousand hive/spark/presto pipelines. -Rui On Fri, Feb 8, 2019 at 4:08 PM Sridevi Nookala < snook...@parallelwireless.com> wrote: > Hi, > > > Our analytics app has many data pi