Here at Globo.com we use Airflow to schedule and manage our spark pipeline.
We use the Yarn API in the Airflow Dags to controls things like garantee
that the job is not running before start another batch.

Tiago Albineli Motta
Desenvolvedor de Software - Globo.com
ICQ: 32107100
http://programandosemcafeina.blogspot.com

On Tue, Nov 29, 2016 at 8:00 PM, Bruno Faria <brunocf...@hotmail.com> wrote:

> I have a standalone Spark cluster and have some jobs scheduled using
> crontab.
>
> It works but I don't have all the real time monitoring to get emails or to
> control a flow for example.
>
> Thought about using the Spark "hidden" API to have a better control but
> seems the API is not officially documented and I don't see much talking
> about that on that web.
>
> Another option would be Oozie but looks like Oozie only works with Hadoop
> so I'd need to install it and change my architecture.
>
> Is there any other option you suggest?
>
> I'm using only open source versions (no dist)
>
> Thanks
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
>

Reply via email to