Hi Florian,

have you looked at what Mesos is already offering out of the box [1]? Maybe 
there is a way to implement your features by relying on Mesos directly, instead 
of making the Aurora implementation more flexible. 

As you've mentioned, the  lifecycle endpoints abort and quit seem to be quite 
orthogonal to the health checking idea. I would be in favor of separating the 
different concepts. I even thought about this yesterday, because in our 
environment we only want health checking but now also have to pay a  price of 
10secs additional latency when stopping jobs due the graceful kill escalation.

[1] https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L141


Regards,
Stephan

________________________________________
From: Florian Pfeiffer <florian.pfeif...@gutefrage.net>
Sent: Saturday, February 21, 2015 4:27 AM
To: dev@aurora.incubator.apache.org
Subject: RFC HealthCheck

Hi,

I would like to start working on the Healthchecker

1) Enable configuration of the portname to which run health checks on (this 
should also tackle AURORA-321 )
This seems like a very small change consisting of adding a new variable named 
„port“ to the HealthCheckConfig  in base.py with a default value of „health“ to 
be backwards compatible. Any pitfalls? Any objections?

2) There’s at least one ticket in jira that’s about making the endpoints for 
the health check configurable. I would like to have a health check that works 
on HTTP Status Codes, and there might be other people that are fine with a 
health check that works on checking if it’s possible to make a TCP connection

For my use case I would probably be fine, if I add a variable „method“ to the 
HealthCheckConfig, with a  default value of „classic“ for the current behavior 
and s.th<http://s.th>. like „statuscode“ for a check that’s very very similar 
to the current one in http_signaler.py but instead of parsing the response 
checks the status code (with the downside that the endpoints /health /abort 
/quitquitquit are still hardcoded)

Any ideas how this can be a little bit more generic, so that if we have 3-5 
different types of health checks we can have different arguments to each health 
check? (e.g. expected_response for the current one, expected_code for the 
status code checker, and maybe s.th<http://s.th>. like max_response_time for 
defining how fast traffic has to appear on a tcp connection check)


A side question: for me it seems like /health and (/abort & /quitquitquit) are 
not very closely related. Does it make sense to have those 3 things grouped in 
the HealthCheck?


Best,
Florian


Reply via email to