Re: Making sense of Aurora terminal states

2015-02-20 Thread Hussein Elgridly
This is fantastic (and I'm glad that my understanding was mostly correct) - thanks a lot. Might I suggest folding this information into the user guide? Maybe it's only relevant for my use case, but I feel like "tasks in terminal states might be cloned and rescheduled; here's when that might happen

Re: Making sense of Aurora terminal states

2015-02-20 Thread Hussein Elgridly
>> 5. A job in the LOST state will always be rescheduled unless it went >> through KILLING first. (What does this represent - killed by user and then >> lost connectivity to the slave?) >> > True. That is one way it could happen, it could also happen if the > scheduler times the task out while wa

Re: Making sense of Aurora terminal states

2015-02-20 Thread Hussein Elgridly
Also (sorry for repeated messages), what's the deal with KILLING -> [FINISHED, FAILED]? User sends kill request but Mesos reports it's done before it gets through so congratulations, you get to keep it? Hussein Elgridly Senior Software Engineer, DSDE The Broad Institute of MIT and Harvard On 20

Re: [VOTE] Graduate Apache Aurora as a TLP

2015-02-20 Thread Dobromir Montauk
+1 On Thu, Feb 19, 2015 at 5:23 PM, Bhuvan Arumugam wrote: > +1 > > Sent from my iPhone > > > On Feb 18, 2015, at 5:26 PM, Jake Farrell wrote: > > > > Based on community discussions on the project mailing lists and the > current > > state of Apache Aurora (incubating) I would like to start a co

Re: [VOTE] Graduate Apache Aurora as a TLP

2015-02-20 Thread Brian Wickman
+1 On Wed, Feb 18, 2015 at 5:26 PM, Jake Farrell wrote: > Based on community discussions on the project mailing lists and the current > state of Apache Aurora (incubating) I would like to start a community VOTE > for Apache Aurora (incubating) to graduate from the Incubator and become a > Top Le

Re: [VOTE] Graduate Apache Aurora as a TLP

2015-02-20 Thread Joseph Jacks
+1 Thanks, JJ. > On Feb 18, 2015, at 5:26 PM, Jake Farrell wrote: > > Based on community discussions on the project mailing lists and the current > state of Apache Aurora (incubating) I would like to start a community VOTE > for Apache Aurora (incubating) to graduate from the Incubator and bec

Heads up - breaking changes coming to API backing beta-update

2015-02-20 Thread Bill Farner
Hi folks, As part of AURORA-1093 [1], i will be breaking backwards compatibility on the thrift API that backs the beta-update command. This is part of the contract that we hope to communicate by tagging features as beta - they are subject to change swiftly. *What this means for you* If you do no

Build failed in Jenkins: Aurora #889

2015-02-20 Thread Apache Jenkins Server
See Changes: [zmanji] Remove single caller methods from AuroraCommandContext [wickman] Instrument the HealthChecker to export stats. -- [...truncated 4179 lines...] src/test/python/a

Re: Build failed in Jenkins: Aurora #889

2015-02-20 Thread Zameer Manji
The health checker test seems flaky or incorrect. Wickman, can you please take a look? On Fri, Feb 20, 2015 at 4:15 PM, Apache Jenkins Server < jenk...@builds.apache.org> wrote: > See > > Changes: > > [zmanji] Remove single caller methods from Au

Re: Build failed in Jenkins: Aurora #889

2015-02-20 Thread Brian Wickman
Looking. On Fri, Feb 20, 2015 at 4:17 PM, Zameer Manji wrote: > The health checker test seems flaky or incorrect. Wickman, can you please > take a look? > > On Fri, Feb 20, 2015 at 4:15 PM, Apache Jenkins Server < > jenk...@builds.apache.org> wrote: > >> See

Re: Build failed in Jenkins: Aurora #889

2015-02-20 Thread Brian Wickman
There might be a preexisting bug in the test [] Time now: 0.0 File "/private/var/folders/rd/_tjz8zts3g14md1kmf38z6w8gn/T/tmp5K5MuG/.deps/twitter.common.exceptions-0.3.3-py2-none-any.whl/twitter/common/exceptions/__init__.py", line 126, in _excepting

Build failed in Jenkins: Aurora #890

2015-02-20 Thread Apache Jenkins Server
See Changes: [wfarner] Refactor existing write APIs for job updates to use IJobUpdateKey. -- [...truncated 4237 lines...] generated xml file:

RFC HealthCheck

2015-02-20 Thread Florian Pfeiffer
Hi, I would like to start working on the Healthchecker 1) Enable configuration of the portname to which run health checks on (this should also tackle AURORA-321 ) This seems like a very small change consisting of adding a new variable named „port“ to the HealthCheckConfig in base.py with a def