Re: Data processing pipeline workflow management

2015-03-24 Thread Lars Albertsson
Thanks for good advice and input everyone! While we were talking, Pinterest open sourced their workflow engine: http://engineering.pinterest.com/post/113376157699/open-sourcing-pinball. It looks similar to Luigi in terms of architecture. My current plan is to use Aurora in the manner described in

Re: Data processing pipeline workflow management

2015-03-12 Thread BW
quot; > > > Date: Wednesday, March 11, 2015 at 3:21 PM > To: "dev@aurora.incubator.apache.org " < > dev@aurora.incubator.apache.org > > Subject: Re: Data processing pipeline workflow management > > >Hey, > > > >This is a great question. See my comments

Re: Data processing pipeline workflow management

2015-03-11 Thread Mattmann, Chris A (3980)
3:21 PM To: "dev@aurora.incubator.apache.org" Subject: Re: Data processing pipeline workflow management >Hey, > >This is a great question. See my comments inline below. > >On Tue, Mar 10, 2015 at 8:28 AM, Lars Albertsson > >wrote: > >> We are evaluat

Re: Data processing pipeline workflow management

2015-03-11 Thread Zameer Manji
Hey, This is a great question. See my comments inline below. On Tue, Mar 10, 2015 at 8:28 AM, Lars Albertsson wrote: > We are evaluating Aurora as a workflow management tool for batch > processing pipelines. We basically need a tool that regularly runs > batch processes that are connected as pr

Re: Data processing pipeline workflow management

2015-03-11 Thread Bill Farner
I'm afraid in general the use cases you describe are not things that Aurora currently intends to fulfill. Though, that's not to say that you could not do this on top of Aurora if you wanted to. Does anyone have experience with building workflows with Aurora? I do not. I could opine about how o

Data processing pipeline workflow management

2015-03-10 Thread Lars Albertsson
We are evaluating Aurora as a workflow management tool for batch processing pipelines. We basically need a tool that regularly runs batch processes that are connected as producers/consumers of data, typically stored in HDFS or S3. The alternative tools would be Azkaban, Luigi, and Oozie, but I am