Re: Best approach to schedule Spark jobs

2016-11-29 Thread Sandeep Samudrala
Here at Inmobi, we use Apache Falcon (with oozie). The pipelines are fully functional in production. You can look into Apache Falcon site for more details. On Wed, Nov 30, 2016 at 7:36 AM, Tiago Albineli Motta wrote: > Here at Globo.com we use Airflow to schedule and

Re: Best approach to schedule Spark jobs

2016-11-29 Thread Tiago Albineli Motta
Here at Globo.com we use Airflow to schedule and manage our spark pipeline. We use the Yarn API in the Airflow Dags to controls things like garantee that the job is not running before start another batch. Tiago Albineli Motta Desenvolvedor de Software - Globo.com ICQ: 32107100 http://programandose

Best approach to schedule Spark jobs

2016-11-29 Thread Bruno Faria
I have a standalone Spark cluster and have some jobs scheduled using crontab. It works but I don't have all the real time monitoring to get emails or to control a flow for example. Thought about using the Spark "hidden" API to have a better control but seems the API is not officially documented