Hi everyone, we are implementing the /health endpoint in our services but omit the implementation of the unauthenticated lifecycle methods /quitquitquit and /abortabortabort.
As a consequence, stopping a service is taxed by 10 seconds waiting time [1]. I would like to get rid of this unnecessary delay and can think of two solutions: a) Only perform the escalation wait when the http_signaler reports that the message could be delivered to the service. This is a rather simple and localized fix. b) Use another port for lifecycle events. This would require a new addition to the task configuration and proper plumbing throughout the rest of the system. Backward compatibility could be achieved by using 'health' as the default lifecycle management port. Any thoughts? I would be happy with the simple solution, but in the end it's your call :-) Best Regards, Stephan [1] https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/executor/thermos_task_runner.py#L123