That's a good question and I'm not really sure the historical reasons as
to why they are not, maybe someone with more historical wisdom will
chime in.
I know that I put up https://review.openstack.org/#/c/12759/ many years
ago (commentary there may be useful in historical investigation)...
Andy Botting wrote:
Thanks to Simon, Josh and Kris who replied to my last email about the
healthcheck middlewear - these are now working well for us.
I'm sure there are plenty of operators, like us, who didn't know this
existed.
Is there any reason why they're not enabled by default?
cheers,
Andy
On 30 April 2016 at 11:52, Joshua Harlow <harlo...@fastmail.com
<mailto:harlo...@fastmail.com>> wrote:
This can help u more easily view what the healthcheck middleware can
also show (especially in detailed mode); it can show thread stacks
and such which can be useful for debugging stuck servers and such
(similar in concept to apache mod_status).
https://review.openstack.org/#/c/311482/
Run the above review like:
$ python oslo_middleware/healthcheck/ -p 8000
Then open a browser to http://127.0.0.1:8000/ (or other port).
-Josh
Joshua Harlow wrote:
Yup, that healthcheck middleware was made more advanced by me,
If u need to do anything special with it, let me know and I can help
make that possible (or at least instruct what might need changed
to do
that).
Simon Pasquier wrote:
Hi,
On Thu, Apr 28, 2016 at 5:13 AM, Andy Botting
<a...@andybotting.com <mailto:a...@andybotting.com>
<mailto:a...@andybotting.com <mailto:a...@andybotting.com>>>
wrote:
We're running our services clustered behind an F5
loadbalancer in
production, and haproxy in our testing environment. This
setup works
quite well for us, but I'm not that happy with testing the
health of
our endpoints.
We're currently calling basic URLs like / or /v2 etc and some
services return a 200, some return other codes like 401. Our
healthcheck test simply checks against whatever the http code
returns. This works OK and does catch basic service failure.
Our test environment is on flaky hardware and often fails in
strange
ways and sometimes the port is open and basic URLs work, but
actually doing real API calls fail and timeout, so our
checks fall
down here.
In a previous role I had, the developers added a url (e.g.
/healthcheck) to each web application which went through and
tested
things like the db connection was OK, memcached was
accessible, etc
and returned a 200. This worked out really great for
operations. I
haven't seen anything like this for OpenStack.
There's a healthcheck oslo.middleware plugin [1] available.
So you could
possibly configure the service pipeline to include this
except it won't
exercise the db connection, RabbitMQ connection, and so on.
But it would
help if you want to kick out a service instance from the
load-balancer
without stopping the service completely [2].
[1]
http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html
[2]
http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html#disable-by-file
I'm wondering how everyone else does healthchecking of their
clustered services, and whether or not they think adding a
dedicated
heathcheck URL would be beneficial?
From what I can tell, people are doing the same thing as
you do: check
that a well-known location ('/', '/v2' or else) returns the
expected
code and hope that it will work for real user requests too.
Simon
We do use scripts similar to ones in the
osops-tools-monitoring in
Nagios which help with more complex testing, but I'm thinking of
something more lightweight specifically for setting up on
loadbalancers.
cheers,
Andy
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators