Re: Spark job workflow engine recommendations

2015-11-18 Thread Fengdong Yu
Yes, you can submit job remotely. > On Nov 19, 2015, at 10:10 AM, Vikram Kone wrote: > > Hi Feng, > Does airflow allow remote submissions of spark jobs via spark-submit? > > On Wed, Nov 18, 2015 at 6:01 PM, Fengdong Yu > wrote: > Hi, > > we use ‘Airflow' as

Re: Spark job workflow engine recommendations

2015-11-18 Thread Vikram Kone
Hi Feng, Does airflow allow remote submissions of spark jobs via spark-submit? On Wed, Nov 18, 2015 at 6:01 PM, Fengdong Yu wrote: > Hi, > > we use ‘Airflow' as our job workflow scheduler. > > > > > On Nov 19, 2015, at 9:47 AM, Vikram Kone wrote: > > Hi Nick, > Quick question about spark-submi

Re: Spark job workflow engine recommendations

2015-11-18 Thread Fengdong Yu
Hi, we use ‘Airflow' as our job workflow scheduler. > On Nov 19, 2015, at 9:47 AM, Vikram Kone wrote: > > Hi Nick, > Quick question about spark-submit command executed from azkaban with command > job type. > I see that when I press kill in azkaban portal on a spark-submit job, it > doesn'

Re: Spark job workflow engine recommendations

2015-11-18 Thread Vikram Kone
Hi Nick, Quick question about spark-submit command executed from azkaban with command job type. I see that when I press kill in azkaban portal on a spark-submit job, it doesn't actually kill the application on spark master and it continues to run even though azkaban thinks that it's killed. How do

Re: Spark job workflow engine recommendations

2015-10-07 Thread Nick Pentreath
We're also using Azkaban for scheduling, and we simply use spark-submit via she'll scripts. It works fine. The auto retry feature with a large number of retries (like 100 or 1000 perhaps) should take care of long-running jobs with restarts on failure. We haven't used it for streaming yet tho

Re: Spark job workflow engine recommendations

2015-10-07 Thread Vikram Kone
Hien, I saw this pull request and from what I understand this is geared towards running spark jobs over hadoop. We are using spark over cassandra and not sure if this new jobtype supports that. I haven't seen any documentation in regards to how to use this spark job plugin, so that I can test it ou

Re: Spark job workflow engine recommendations

2015-10-07 Thread Hien Luu
The spark job type was added recently - see this pull request https://github.com/azkaban/azkaban-plugins/pull/195. You can leverage the SLA feature to kill a job if it ran longer than expected. BTW, we just solved the scalability issue by supporting multiple executors. Within a week or two, the

Re: Spark job workflow engine recommendations

2015-10-06 Thread Vikram Kone
Does Azkaban support scheduling long running jobs like spark steaming jobs? Will Azkaban kill a job if it's running for a long time. On Friday, August 7, 2015, Vikram Kone wrote: > Hien, > Is Azkaban being phased out at linkedin as rumored? If so, what's linkedin > going to use for workflow sche

Re: Spark job workflow engine recommendations

2015-08-11 Thread Nick Pentreath
I also tend to agree that Azkaban is somehqat easier to get set up. Though I haven't used the new UI for Oozie that is part of CDH, so perhaps that is another good option. It's a pity Azkaban is a little rough in terms of documenting its API, and the scalability is an issue. However it would

Re: Spark job workflow engine recommendations

2015-08-11 Thread Vikram Kone
Hi LarsThanks for the brain dump. All the points you made about target audience, degree of high availability and time based scheduling instead of event based scheduling are all valid and make sense.In our case, most of your Devs are .net based and so xml or web based scheduling is preferred over

Re: Spark job workflow engine recommendations

2015-08-11 Thread Ruslan Dautkhanov
We use Talend, but not for Spark workflows. Although it does have Spark componenets. https://www.talend.com/download/talend-open-studio It is free (commercial support available), easy to design and deploy workflows. Talend for BigData 6.0 was released as month ago. Is anybody using Talend for Spa

Re: Spark job workflow engine recommendations

2015-08-11 Thread Hien Luu
We are in the middle of figuring that out. At the high level, we want to combine the best parts of existing workflow solutions. On Fri, Aug 7, 2015 at 3:55 PM, Vikram Kone wrote: > Hien, > Is Azkaban being phased out at linkedin as rumored? If so, what's linkedin > going to use for workflow sch

Re: Spark job workflow engine recommendations

2015-08-09 Thread Lars Albertsson
I used to maintain Luigi at Spotify, and got some insight in workflow manager characteristics and production behaviour in the process. I am evaluating options for my current employer, and the short list is basically: Luigi, Azkaban, Pinball, Airflow, and rolling our own. The latter is not necessar

Re: Spark job workflow engine recommendations

2015-08-07 Thread Vikram Kone
Hien, Is Azkaban being phased out at linkedin as rumored? If so, what's linkedin going to use for workflow scheduling? Is there something else that's going to replace Azkaban? On Fri, Aug 7, 2015 at 11:25 AM, Ted Yu wrote: > In my opinion, choosing some particular project among its peers should

Re: Spark job workflow engine recommendations

2015-08-07 Thread Ted Yu
In my opinion, choosing some particular project among its peers should leave enough room for future growth (which may come faster than you initially think). Cheers On Fri, Aug 7, 2015 at 11:23 AM, Hien Luu wrote: > Scalability is a known issue due the the current architecture. However > this w

Re: Spark job workflow engine recommendations

2015-08-07 Thread Hien Luu
Scalability is a known issue due the the current architecture. However this will be applicable if you run more 20K jobs per day. On Fri, Aug 7, 2015 at 10:30 AM, Ted Yu wrote: > From what I heard (an ex-coworker who is Oozie committer), Azkaban is > being phased out at LinkedIn because of scala

Re: Spark job workflow engine recommendations

2015-08-07 Thread Vikram Kone
Oh ok. That's a good enough reason against azkaban then. So looks like Oozie is the best choice here. On Friday, August 7, 2015, Ted Yu wrote: > From what I heard (an ex-coworker who is Oozie committer), Azkaban is > being phased out at LinkedIn because of scalability issues (though UI-wise, > A

Re: Spark job workflow engine recommendations

2015-08-07 Thread Ted Yu
>From what I heard (an ex-coworker who is Oozie committer), Azkaban is being phased out at LinkedIn because of scalability issues (though UI-wise, Azkaban seems better). Vikram: I suggest you do more research in related projects (maybe using their mailing lists). Disclaimer: I don't work for Link

Re: Spark job workflow engine recommendations

2015-08-07 Thread Nick Pentreath
Hi Vikram, We use Azkaban (2.5.0) in our production workflow scheduling. We just use local mode deployment and it is fairly easy to set up. It is pretty easy to use and has a nice scheduling and logging interface, as well as SLAs (like kill job and notify if it doesn't complete in 3 hours or whate

Re: Spark job workflow engine recommendations

2015-08-07 Thread Jörn Franke
Check also falcon in combination with oozie Le ven. 7 août 2015 à 17:51, Hien Luu a écrit : > Looks like Oozie can satisfy most of your requirements. > > > > On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone wrote: > >> Hi, >> I'm looking for open source workflow tools/engines that allow us to >> sch

Re: Spark job workflow engine recommendations

2015-08-07 Thread Vikram Kone
Thanks for the suggestion Hien. I'm curious why not azkaban from linkedin. >From what I read online Oozie was very cumbersome to setup and use compared to azkaban. Since you are from linkedin wanted to get some perspective on what it lacks compared to Oozie. Ease of use is very important more than

Re: Spark job workflow engine recommendations

2015-08-07 Thread Hien Luu
Looks like Oozie can satisfy most of your requirements. On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone wrote: > Hi, > I'm looking for open source workflow tools/engines that allow us to > schedule spark jobs on a datastax cassandra cluster. Since there are tonnes > of alternatives out there like