Re: Integrating with Airflow

2017-05-30 Thread Erik Schuchmann
We have begun experimenting with an airflow/zeppelin integration. We use the first paragraph of a note to define dependencies and outputs; names and owners; and schedule for the note. There are utility functions (in scala) available that provide a data catalog for retrieving data sources. These f

Re: Integrating with Airflow

2017-05-19 Thread Ben Vogan
Thanks for sharing this Ruslan - I will take a look. I agree that paragraphs can form tasks within a DAG. My point was that ideally a DAG could encompass multiple notes. I.e. the completion of one note triggers another and so on to complete an entire chain of dependent tasks. For example team A

Re: Integrating with Airflow

2017-05-19 Thread Ruslan Dautkhanov
Thanks for sharing this Ben. I agree Zeppelin is a better fit with tighter integration with Spark and built-in visualizations. We have pretty much standardized on pySpark, so here's one of the scripts we use internally to extract %pyspark, %sql and %md paragraphs into a standalone script (that ca

Re: Integrating with Airflow

2017-05-19 Thread Ben Vogan
I do not expect the relationship between DAGs to be described in Zeppelin - that would be done in Airflow. It just seems that Zeppelin is such a great tool for a data scientists workflow that it would be nice if once they are done with the work the note could be productionized directly. I could e

Re: Integrating with Airflow

2017-05-19 Thread Ruslan Dautkhanov
We also use both Zeppelin and Airflow. I'm interested in hearing what others are doing here too. Although honestly there might be some challenges - Airflow expects a DAG structure, while a notebook has pretty linear structure; - Airflow is Python-based; Zeppelin is all Java (REST API might be of

Integrating with Airflow

2017-05-19 Thread Ben Vogan
Hi all, We are really enjoying the workflow of interacting with our data via Zeppelin, but are not sold on using the built in cron scheduling capability. We would like to be able to create more complex DAGs that are better suited for something like Airflow. I was curious as to whether anyone has