Re: Making sense of Aurora terminal states

2015-02-21 Thread Hussein Elgridly
> You seem like you are now sufficiently-equipped to add this doc. Any > chance you're game to write the doc you wish you had read? :-) Possibly. Time constraints aside, my concern is that the questions I've asked (and the answers I was seeking) were based on the assumption that my jobs all had u

Re: Making sense of Aurora terminal states

2015-02-21 Thread Bill Farner
> > Might I suggest folding this information into the user guide? You seem like you are now sufficiently-equipped to add this doc. Any chance you're game to write the doc you wish you had read? :-) Just to be absolutely clear on this: KILLING -> LOST will _never_ result in > a reschedule? What

Re: Making sense of Aurora terminal states

2015-02-20 Thread Hussein Elgridly
Also (sorry for repeated messages), what's the deal with KILLING -> [FINISHED, FAILED]? User sends kill request but Mesos reports it's done before it gets through so congratulations, you get to keep it? Hussein Elgridly Senior Software Engineer, DSDE The Broad Institute of MIT and Harvard On 20

Re: Making sense of Aurora terminal states

2015-02-20 Thread Hussein Elgridly
>> 5. A job in the LOST state will always be rescheduled unless it went >> through KILLING first. (What does this represent - killed by user and then >> lost connectivity to the slave?) >> > True. That is one way it could happen, it could also happen if the > scheduler times the task out while wa

Re: Making sense of Aurora terminal states

2015-02-20 Thread Hussein Elgridly
This is fantastic (and I'm glad that my understanding was mostly correct) - thanks a lot. Might I suggest folding this information into the user guide? Maybe it's only relevant for my use case, but I feel like "tasks in terminal states might be cloned and rescheduled; here's when that might happen

Re: Making sense of Aurora terminal states

2015-02-19 Thread Bill Farner
On Thu, Feb 19, 2015 at 1:27 PM, Hussein Elgridly < huss...@broadinstitute.org> wrote: > I've just spent the afternoon making a flowchart out of > TaskStateMachine.java in an attempt to figure out what Aurora states > actually mean. Given that all the jobs I submit have unique names and I > don't

Making sense of Aurora terminal states

2015-02-19 Thread Hussein Elgridly
I've just spent the afternoon making a flowchart out of TaskStateMachine.java in an attempt to figure out what Aurora states actually mean. Given that all the jobs I submit have unique names and I don't permit retries, I would like to put together a set of rules that determine whether a job is _rea