Unfortunately I don't think my change will be able to make it in as-is.

As Brian Wickman pointed out, it could introduce serious problems because
there are varying timeouts across the scheduler/executor, so if you set
your wait time to be too high, the scheduler might start to consider the
tasks lost because they stayed in the transient KILLING state for too long.

I do think the lifecycle modules idea would solve Stephan's issue.

On Tue, Mar 24, 2015 at 5:06 PM, Brian Brazil <brian.bra...@boxever.com>
wrote:

> On 24 March 2015 at 20:57, Erb, Stephan <stephan....@blue-yonder.com>
> wrote:
>
> > Hi everyone,
> >
> > we are implementing the /health endpoint in our services but omit the
> > implementation of the unauthenticated lifecycle methods /quitquitquit and
> > /abortabortabort.
> >
> > As a consequence, stopping a service is taxed by 10 seconds waiting time
> > [1]. I would like to get rid of this unnecessary delay and can think of
> two
> > solutions:
> >
> > a) Only perform the escalation wait when the http_signaler reports that
> > the message could be delivered to the service. This is a rather simple
> and
> > localized fix.
> >
> > b) Use another port for lifecycle events. This would require a new
> > addition to the task configuration and proper plumbing throughout the
> rest
> > of the system. Backward compatibility could be achieved by using 'health'
> > as the default lifecycle management port.
> >
> > Any thoughts? I would be happy with the simple solution, but in the end
> > it's your call :-)
> >
>
> __george mentioned on IRC working on a change that'll let the wait time be
> configurable (which is something I also need), would that cover your use
> case?
>
> There were also discussions on IRC about custom lifecycle modules.
>
> Brian
>
>
> >
> > Best Regards,
> > Stephan
> >
> > [1]
> >
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/executor/thermos_task_runner.py#L123
>

Reply via email to