In my opinion, choosing some particular project among its peers should
leave enough room for future growth (which may come faster than you
initially think).

Cheers

On Fri, Aug 7, 2015 at 11:23 AM, Hien Luu <h...@linkedin.com> wrote:

> Scalability is a known issue due the the current architecture.  However
> this will be applicable if you run more 20K jobs per day.
>
> On Fri, Aug 7, 2015 at 10:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> From what I heard (an ex-coworker who is Oozie committer), Azkaban is
>> being phased out at LinkedIn because of scalability issues (though UI-wise,
>> Azkaban seems better).
>>
>> Vikram:
>> I suggest you do more research in related projects (maybe using their
>> mailing lists).
>>
>> Disclaimer: I don't work for LinkedIn.
>>
>> On Fri, Aug 7, 2015 at 10:12 AM, Nick Pentreath <nick.pentre...@gmail.com
>> > wrote:
>>
>>> Hi Vikram,
>>>
>>> We use Azkaban (2.5.0) in our production workflow scheduling. We just
>>> use local mode deployment and it is fairly easy to set up. It is pretty
>>> easy to use and has a nice scheduling and logging interface, as well as
>>> SLAs (like kill job and notify if it doesn't complete in 3 hours or
>>> whatever).
>>>
>>> However Spark support is not present directly - we run everything with
>>> shell scripts and spark-submit. There is a plugin interface where one could
>>> create a Spark plugin, but I found it very cumbersome when I did
>>> investigate and didn't have the time to work through it to develop that.
>>>
>>> It has some quirks and while there is actually a REST API for adding jos
>>> and dynamically scheduling jobs, it is not documented anywhere so you kinda
>>> have to figure it out for yourself. But in terms of ease of use I found it
>>> way better than Oozie. I haven't tried Chronos, and it seemed quite
>>> involved to set up. Haven't tried Luigi either.
>>>
>>> Spark job server is good but as you say lacks some stuff like scheduling
>>> and DAG type workflows (independent of spark-defined job flows).
>>>
>>>
>>> On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke <jornfra...@gmail.com>
>>> wrote:
>>>
>>>> Check also falcon in combination with oozie
>>>>
>>>> Le ven. 7 août 2015 à 17:51, Hien Luu <h...@linkedin.com.invalid> a
>>>> écrit :
>>>>
>>>>> Looks like Oozie can satisfy most of your requirements.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone <vikramk...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I'm looking for open source workflow tools/engines that allow us to
>>>>>> schedule spark jobs on a datastax cassandra cluster. Since there are 
>>>>>> tonnes
>>>>>> of alternatives out there like Ozzie, Azkaban, Luigi , Chronos etc, I
>>>>>> wanted to check with people here to see what they are using today.
>>>>>>
>>>>>> Some of the requirements of the workflow engine that I'm looking for
>>>>>> are
>>>>>>
>>>>>> 1. First class support for submitting Spark jobs on Cassandra. Not
>>>>>> some wrapper Java code to submit tasks.
>>>>>> 2. Active open source community support and well tested at production
>>>>>> scale.
>>>>>> 3. Should be dead easy to write job dependencices using XML or web
>>>>>> interface . Ex; job A depends on Job B and Job C, so run Job A after B 
>>>>>> and
>>>>>> C are finished. Don't need to write full blown java applications to 
>>>>>> specify
>>>>>> job parameters and dependencies. Should be very simple to use.
>>>>>> 4. Time based  recurrent scheduling. Run the spark jobs at a given
>>>>>> time every hour or day or week or month.
>>>>>> 5. Job monitoring, alerting on failures and email notifications on
>>>>>> daily basis.
>>>>>>
>>>>>> I have looked at Ooyala's spark job server which seems to be hated
>>>>>> towards making spark jobs run faster by sharing contexts between the jobs
>>>>>> but isn't a full blown workflow engine per se. A combination of spark job
>>>>>> server and workflow engine would be ideal
>>>>>>
>>>>>> Thanks for the inputs
>>>>>>
>>>>>
>>>>>
>>>
>>
>

Reply via email to