Piggy-backing on the thread a little -- Does anyone out there use luigi to manage spark workflows?
I see they recently added spark support On Sun, Mar 1, 2015 at 10:20 PM, Qiang Cao <caoqiang...@gmail.com> wrote: > Thanks, Himanish and Felix! > > On Sun, Mar 1, 2015 at 7:50 PM, Himanish Kushary <himan...@gmail.com> > wrote: > >> We are running our Spark jobs on Amazon AWS and are using AWS >> Datapipeline for orchestration of the different spark jobs. AWS >> datapipeline provides automatic EMR cluster provisioning, retry on >> failure,SNS notification etc. out of the box and works well for us. >> >> >> >> >> >> On Sun, Mar 1, 2015 at 7:02 PM, Felix C <felixcheun...@hotmail.com> >> wrote: >> >>> We use Oozie as well, and it has worked well. >>> The catch is each action in Oozie is separate and one cannot retain >>> SparkContext or RDD, or leverage caching or temp table, going into another >>> Oozie action. You could either save output to file or put all Spark >>> processing into one Oozie action. >>> >>> --- Original Message --- >>> >>> From: "Mayur Rustagi" <mayur.rust...@gmail.com> >>> Sent: February 28, 2015 7:07 PM >>> To: "Qiang Cao" <caoqiang...@gmail.com> >>> Cc: "Ted Yu" <yuzhih...@gmail.com>, "Ashish Nigam" < >>> ashnigamt...@gmail.com>, "user" <user@spark.apache.org> >>> Subject: Re: Tools to manage workflows on Spark >>> >>> Sorry not really. Spork is a way to migrate your existing pig scripts >>> to Spark or write new pig jobs then can execute on spark. >>> For orchestration you are better off using Oozie especially if you are >>> using other execution engines/systems besides spark. >>> >>> >>> Regards, >>> Mayur Rustagi >>> Ph: +1 (760) 203 3257 >>> http://www.sigmoid.com <http://www.sigmoidanalytics.com/> >>> @mayur_rustagi <http://www.twitter.com/mayur_rustagi> >>> >>> On Sat, Feb 28, 2015 at 6:59 PM, Qiang Cao <caoqiang...@gmail.com> >>> wrote: >>> >>> Thanks Mayur! I'm looking for something that would allow me to easily >>> describe and manage a workflow on Spark. A workflow in my context is a >>> composition of Spark applications that may depend on one another based on >>> hdfs inputs/outputs. Is Spork a good fit? The orchestration I want is on >>> app level. >>> >>> >>> >>> On Sat, Feb 28, 2015 at 9:38 PM, Mayur Rustagi <mayur.rust...@gmail.com> >>> wrote: >>> >>> We do maintain it but in apache repo itself. However Pig cannot do >>> orchestration for you. I am not sure what you are looking at from Pig in >>> this context. >>> >>> Regards, >>> Mayur Rustagi >>> Ph: +1 (760) 203 3257 >>> http://www.sigmoid.com <http://www.sigmoidanalytics.com/> >>> @mayur_rustagi <http://www.twitter.com/mayur_rustagi> >>> >>> On Sat, Feb 28, 2015 at 6:36 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>> >>> Here was latest modification in spork repo: >>> Mon Dec 1 10:08:19 2014 >>> >>> Not sure if it is being actively maintained. >>> >>> On Sat, Feb 28, 2015 at 6:26 PM, Qiang Cao <caoqiang...@gmail.com> >>> wrote: >>> >>> Thanks for the pointer, Ashish! I was also looking at Spork >>> https://github.com/sigmoidanalytics/spork Pig-on-Spark), but wasn't >>> sure if that's the right direction. >>> >>> On Sat, Feb 28, 2015 at 6:36 PM, Ashish Nigam <ashnigamt...@gmail.com> >>> wrote: >>> >>> You have to call spark-submit from oozie. >>> I used this link to get the idea for my implementation - >>> >>> >>> http://mail-archives.apache.org/mod_mbox/oozie-user/201404.mbox/%3CCAHCsPn-0Grq1rSXrAZu35yy_i4T=fvovdox2ugpcuhkwmjp...@mail.gmail.com%3E >>> >>> >>> >>> On Feb 28, 2015, at 3:25 PM, Qiang Cao <caoqiang...@gmail.com> wrote: >>> >>> Thanks, Ashish! Is Oozie integrated with Spark? I knew it can >>> accommodate some Hadoop jobs. >>> >>> >>> On Sat, Feb 28, 2015 at 6:07 PM, Ashish Nigam <ashnigamt...@gmail.com> >>> wrote: >>> >>> Qiang, >>> Did you look at Oozie? >>> We use oozie to run spark jobs in production. >>> >>> >>> On Feb 28, 2015, at 2:45 PM, Qiang Cao <caoqiang...@gmail.com> wrote: >>> >>> Hi Everyone, >>> >>> We need to deal with workflows on Spark. In our scenario, each >>> workflow consists of multiple processing steps. Among different steps, >>> there could be dependencies. I'm wondering if there are tools >>> available that can help us schedule and manage workflows on Spark. I'm >>> looking for something like pig on Hadoop, but it should fully function on >>> Spark. >>> >>> Any suggestion? >>> >>> Thanks in advance! >>> >>> Qiang >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> -- >> Thanks & Regards >> Himanish >> > >