Thx for answering! see inline for my thoughts (or misunderstanding ? ^^) Andy, doesn't Marathon handle fault tolerance amongst its apps? ie if > you say that N instances of an app are running, and one shuts off, > then it spins up another one no? > Yes indeed, but my wonder is about how to know how many instances we need? You know, it's purely dependent of the amount of resource consumed by the drivers, so it fluctuates with the time. In my actual thinking, the JobServer could ask mesos for resources depending on the amount of resources of its currently managed job list (so themselves should be able to deliver such info). Then (perhaps) marathon can be (hot-)tuned to maintain N+M or N-M instances depending of the load... But maybe am I crossing some boundaries, the ones with auto-scaling :-/
> > The tricky thing was that I was planning to use Akka Cluster to > coordinate, but Mesos itself can be used to coordinate as well, which > is an overlap/.... but I didn't want ot make job server HA just > reliant only on Mesos... You mean using Akka cluster to dispatch jobs on the managed (Job Server) nodes? That's something actually interesting as well, but I guess would require some duplicated work with what Mesos or Yarn are doing (that is resources management) right? > Anyways we can discuss offline if needed. > Definitively, let's stop polluting the list !!! C ya andy > > On Thu, Mar 20, 2014 at 1:35 AM, andy petrella <andy.petre...@gmail.com> > wrote: > > Heya, > > That's cool you've already hacked something for this in the scripts! > > > > I have a related question, how would it work actually. I mean, to have > this > > Job Server fault tolerant using Marathon, I would guess that it will need > > to be itself a Mesos framework, and able to publish its resources needs. > > And also, for that, the Job Server has to be aware of the resources > needed > > by the Spark drivers that it will run, which is not as easy to guess, > > unless it is provided by the job itself? > > > > I didn't checked the Job Server deep enough so it might be already the > case > > (or I'm expressing something completely dumb ^^). > > > > For sure, we'll try to share it when we'll reach this point to deploy > using > > marathon (should be planned for April) > > > > greetz and again, Nice Work Evan! > > > > Ndi > > > > On Wed, Mar 19, 2014 at 7:27 AM, Evan Chan <e...@ooyala.com> wrote: > > > >> Andy, > >> > >> Yeah, we've thought of deploying this on Marathon ourselves, but we're > >> not sure how much Mesos we're going to use yet. (Indeed if you look > >> at bin/server_start.sh, I think I set up the PORT environment var > >> specifically for Marathon.) This is also why we have deploy scripts > >> which package into .tar.gz, again for Mesos deployment. > >> > >> If you do try this, please let us know. :) > >> > >> -Evan > >> > >> > >> On Tue, Mar 18, 2014 at 3:57 PM, andy petrella <andy.petre...@gmail.com > > > >> wrote: > >> > tadaaaa! That's awesome. > >> > > >> > A quick question, does someone has insights regarding having such > >> > JobServers deployed using Marathon on Mesos? > >> > > >> > I'm thinking about an arch where Marathon would deploy and keep the > Job > >> > Servers running along with part of the whole set of apps deployed on > it > >> > regarding the resources needed (à la Jenkins). > >> > > >> > Any idea is welcome. > >> > > >> > Back to the news, Evan + Ooyala team: Great Job again. > >> > > >> > andy > >> > > >> > On Tue, Mar 18, 2014 at 11:39 PM, Henry Saputra < > henry.sapu...@gmail.com > >> >wrote: > >> > > >> >> W00t! > >> >> > >> >> Thanks for releasing this, Evan. > >> >> > >> >> - Henry > >> >> > >> >> On Tue, Mar 18, 2014 at 1:51 PM, Evan Chan <e...@ooyala.com> wrote: > >> >> > Dear Spark developers, > >> >> > > >> >> > Ooyala is happy to announce that we have pushed our official, Spark > >> >> > 0.9.0 / Scala 2.10-compatible, job server as a github repo: > >> >> > > >> >> > https://github.com/ooyala/spark-jobserver > >> >> > > >> >> > Complete with unit tests, deploy scripts, and examples. > >> >> > > >> >> > The original PR (#222) on incubator-spark is now closed. > >> >> > > >> >> > Please have a look; pull requests are very welcome. > >> >> > -- > >> >> > -- > >> >> > Evan Chan > >> >> > Staff Engineer > >> >> > e...@ooyala.com | > >> >> > >> > >> > >> > >> -- > >> -- > >> Evan Chan > >> Staff Engineer > >> e...@ooyala.com | > >> > > > > -- > -- > Evan Chan > Staff Engineer > e...@ooyala.com | >