Hi Feng, Does airflow allow remote submissions of spark jobs via spark-submit?
On Wed, Nov 18, 2015 at 6:01 PM, Fengdong Yu <fengdo...@everstring.com> wrote: > Hi, > > we use ‘Airflow' as our job workflow scheduler. > > > > > On Nov 19, 2015, at 9:47 AM, Vikram Kone <vikramk...@gmail.com> wrote: > > Hi Nick, > Quick question about spark-submit command executed from azkaban with > command job type. > I see that when I press kill in azkaban portal on a spark-submit job, it > doesn't actually kill the application on spark master and it continues to > run even though azkaban thinks that it's killed. > How do you get around this? Is there a way to kill the spark-submit jobs > from azkaban portal? > > On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath <nick.pentre...@gmail.com> > wrote: > >> Hi Vikram, >> >> We use Azkaban (2.5.0) in our production workflow scheduling. We just use >> local mode deployment and it is fairly easy to set up. It is pretty easy to >> use and has a nice scheduling and logging interface, as well as SLAs (like >> kill job and notify if it doesn't complete in 3 hours or whatever). >> >> However Spark support is not present directly - we run everything with >> shell scripts and spark-submit. There is a plugin interface where one could >> create a Spark plugin, but I found it very cumbersome when I did >> investigate and didn't have the time to work through it to develop that. >> >> It has some quirks and while there is actually a REST API for adding jos >> and dynamically scheduling jobs, it is not documented anywhere so you kinda >> have to figure it out for yourself. But in terms of ease of use I found it >> way better than Oozie. I haven't tried Chronos, and it seemed quite >> involved to set up. Haven't tried Luigi either. >> >> Spark job server is good but as you say lacks some stuff like scheduling >> and DAG type workflows (independent of spark-defined job flows). >> >> >> On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke <jornfra...@gmail.com> wrote: >> >>> Check also falcon in combination with oozie >>> >>> Le ven. 7 août 2015 à 17:51, Hien Luu <h...@linkedin.com.invalid> a >>> écrit : >>> >>>> Looks like Oozie can satisfy most of your requirements. >>>> >>>> >>>> >>>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramk...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> I'm looking for open source workflow tools/engines that allow us to >>>>> schedule spark jobs on a datastax cassandra cluster. Since there are >>>>> tonnes >>>>> of alternatives out there like Ozzie, Azkaban, Luigi , Chronos etc, I >>>>> wanted to check with people here to see what they are using today. >>>>> >>>>> Some of the requirements of the workflow engine that I'm looking for >>>>> are >>>>> >>>>> 1. First class support for submitting Spark jobs on Cassandra. Not >>>>> some wrapper Java code to submit tasks. >>>>> 2. Active open source community support and well tested at production >>>>> scale. >>>>> 3. Should be dead easy to write job dependencices using XML or web >>>>> interface . Ex; job A depends on Job B and Job C, so run Job A after B and >>>>> C are finished. Don't need to write full blown java applications to >>>>> specify >>>>> job parameters and dependencies. Should be very simple to use. >>>>> 4. Time based recurrent scheduling. Run the spark jobs at a given >>>>> time every hour or day or week or month. >>>>> 5. Job monitoring, alerting on failures and email notifications on >>>>> daily basis. >>>>> >>>>> I have looked at Ooyala's spark job server which seems to be hated >>>>> towards making spark jobs run faster by sharing contexts between the jobs >>>>> but isn't a full blown workflow engine per se. A combination of spark job >>>>> server and workflow engine would be ideal >>>>> >>>>> Thanks for the inputs >>>>> >>>> >>>> >> > >