We are in the middle of figuring that out.  At the high level, we want to
combine the best parts of existing workflow solutions.

On Fri, Aug 7, 2015 at 3:55 PM, Vikram Kone <vikramk...@gmail.com> wrote:

> Hien,
> Is Azkaban being phased out at linkedin as rumored? If so, what's linkedin
> going to use for workflow scheduling? Is there something else that's going
> to replace Azkaban?
>
> On Fri, Aug 7, 2015 at 11:25 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> In my opinion, choosing some particular project among its peers should
>> leave enough room for future growth (which may come faster than you
>> initially think).
>>
>> Cheers
>>
>> On Fri, Aug 7, 2015 at 11:23 AM, Hien Luu <h...@linkedin.com> wrote:
>>
>>> Scalability is a known issue due the the current architecture.  However
>>> this will be applicable if you run more 20K jobs per day.
>>>
>>> On Fri, Aug 7, 2015 at 10:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>
>>>> From what I heard (an ex-coworker who is Oozie committer), Azkaban is
>>>> being phased out at LinkedIn because of scalability issues (though UI-wise,
>>>> Azkaban seems better).
>>>>
>>>> Vikram:
>>>> I suggest you do more research in related projects (maybe using their
>>>> mailing lists).
>>>>
>>>> Disclaimer: I don't work for LinkedIn.
>>>>
>>>> On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath <
>>>> nick.pentre...@gmail.com> wrote:
>>>>
>>>>> Hi Vikram,
>>>>>
>>>>> We use Azkaban (2.5.0) in our production workflow scheduling. We just
>>>>> use local mode deployment and it is fairly easy to set up. It is pretty
>>>>> easy to use and has a nice scheduling and logging interface, as well as
>>>>> SLAs (like kill job and notify if it doesn't complete in 3 hours or
>>>>> whatever).
>>>>>
>>>>> However Spark support is not present directly - we run everything with
>>>>> shell scripts and spark-submit. There is a plugin interface where one 
>>>>> could
>>>>> create a Spark plugin, but I found it very cumbersome when I did
>>>>> investigate and didn't have the time to work through it to develop that.
>>>>>
>>>>> It has some quirks and while there is actually a REST API for adding
>>>>> jos and dynamically scheduling jobs, it is not documented anywhere so you
>>>>> kinda have to figure it out for yourself. But in terms of ease of use I
>>>>> found it way better than Oozie. I haven't tried Chronos, and it seemed
>>>>> quite involved to set up. Haven't tried Luigi either.
>>>>>
>>>>> Spark job server is good but as you say lacks some stuff like
>>>>> scheduling and DAG type workflows (independent of spark-defined job 
>>>>> flows).
>>>>>
>>>>>
>>>>> On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke <jornfra...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Check also falcon in combination with oozie
>>>>>>
>>>>>> Le ven. 7 août 2015 à 17:51, Hien Luu <h...@linkedin.com.invalid> a
>>>>>> écrit :
>>>>>>
>>>>>>> Looks like Oozie can satisfy most of your requirements.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramk...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I'm looking for open source workflow tools/engines that allow us to
>>>>>>>> schedule spark jobs on a datastax cassandra cluster. Since there are 
>>>>>>>> tonnes
>>>>>>>> of alternatives out there like Ozzie, Azkaban, Luigi , Chronos etc, I
>>>>>>>> wanted to check with people here to see what they are using today.
>>>>>>>>
>>>>>>>> Some of the requirements of the workflow engine that I'm looking
>>>>>>>> for are
>>>>>>>>
>>>>>>>> 1. First class support for submitting Spark jobs on Cassandra. Not
>>>>>>>> some wrapper Java code to submit tasks.
>>>>>>>> 2. Active open source community support and well tested at
>>>>>>>> production scale.
>>>>>>>> 3. Should be dead easy to write job dependencices using XML or web
>>>>>>>> interface . Ex; job A depends on Job B and Job C, so run Job A after B 
>>>>>>>> and
>>>>>>>> C are finished. Don't need to write full blown java applications to 
>>>>>>>> specify
>>>>>>>> job parameters and dependencies. Should be very simple to use.
>>>>>>>> 4. Time based  recurrent scheduling. Run the spark jobs at a given
>>>>>>>> time every hour or day or week or month.
>>>>>>>> 5. Job monitoring, alerting on failures and email notifications on
>>>>>>>> daily basis.
>>>>>>>>
>>>>>>>> I have looked at Ooyala's spark job server which seems to be hated
>>>>>>>> towards making spark jobs run faster by sharing contexts between the 
>>>>>>>> jobs
>>>>>>>> but isn't a full blown workflow engine per se. A combination of spark 
>>>>>>>> job
>>>>>>>> server and workflow engine would be ideal
>>>>>>>>
>>>>>>>> Thanks for the inputs
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to