Re: Integrating with Airflow

2017-05-30 Thread Erik Schuchmann
We have begun experimenting with an airflow/zeppelin integration. We use the first paragraph of a note to define dependencies and outputs; names and owners; and schedule for the note. There are utility functions (in scala) available that provide a data catalog for retrieving data sources. These f

Re: Integrating with Airflow

2017-05-19 Thread Ben Vogan
Thanks for sharing this Ruslan - I will take a look. I agree that paragraphs can form tasks within a DAG. My point was that ideally a DAG could encompass multiple notes. I.e. the completion of one note triggers another and so on to complete an entire chain of dependent tasks. For example team A

Re: Integrating with Airflow

2017-05-19 Thread Ruslan Dautkhanov
Thanks for sharing this Ben. I agree Zeppelin is a better fit with tighter integration with Spark and built-in visualizations. We have pretty much standardized on pySpark, so here's one of the scripts we use internally to extract %pyspark, %sql and %md paragraphs into a standalone script (that ca

Re: Integrating with Airflow

2017-05-19 Thread Ben Vogan
I do not expect the relationship between DAGs to be described in Zeppelin - that would be done in Airflow. It just seems that Zeppelin is such a great tool for a data scientists workflow that it would be nice if once they are done with the work the note could be productionized directly. I could e

Re: Integrating with Airflow

2017-05-19 Thread Ruslan Dautkhanov
We also use both Zeppelin and Airflow. I'm interested in hearing what others are doing here too. Although honestly there might be some challenges - Airflow expects a DAG structure, while a notebook has pretty linear structure; - Airflow is Python-based; Zeppelin is all Java (REST API might be of