This can help u more easily view what the healthcheck middleware can
also show (especially in detailed mode); it can show thread stacks and
such which can be useful for debugging stuck servers and such (similar
in concept to apache mod_status).
https://review.openstack.org/#/c/311482/
Run the above review like:
$ python oslo_middleware/healthcheck/ -p 8000
Then open a browser to http://127.0.0.1:8000/ (or other port).
-Josh
Joshua Harlow wrote:
Yup, that healthcheck middleware was made more advanced by me,
If u need to do anything special with it, let me know and I can help
make that possible (or at least instruct what might need changed to do
that).
Simon Pasquier wrote:
Hi,
On Thu, Apr 28, 2016 at 5:13 AM, Andy Botting <a...@andybotting.com
<mailto:a...@andybotting.com>> wrote:
We're running our services clustered behind an F5 loadbalancer in
production, and haproxy in our testing environment. This setup works
quite well for us, but I'm not that happy with testing the health of
our endpoints.
We're currently calling basic URLs like / or /v2 etc and some
services return a 200, some return other codes like 401. Our
healthcheck test simply checks against whatever the http code
returns. This works OK and does catch basic service failure.
Our test environment is on flaky hardware and often fails in strange
ways and sometimes the port is open and basic URLs work, but
actually doing real API calls fail and timeout, so our checks fall
down here.
In a previous role I had, the developers added a url (e.g.
/healthcheck) to each web application which went through and tested
things like the db connection was OK, memcached was accessible, etc
and returned a 200. This worked out really great for operations. I
haven't seen anything like this for OpenStack.
There's a healthcheck oslo.middleware plugin [1] available. So you could
possibly configure the service pipeline to include this except it won't
exercise the db connection, RabbitMQ connection, and so on. But it would
help if you want to kick out a service instance from the load-balancer
without stopping the service completely [2].
[1]
http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html
[2]
http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html#disable-by-file
I'm wondering how everyone else does healthchecking of their
clustered services, and whether or not they think adding a dedicated
heathcheck URL would be beneficial?
From what I can tell, people are doing the same thing as you do: check
that a well-known location ('/', '/v2' or else) returns the expected
code and hope that it will work for real user requests too.
Simon
We do use scripts similar to ones in the osops-tools-monitoring in
Nagios which help with more complex testing, but I'm thinking of
something more lightweight specifically for setting up on loadbalancers.
cheers,
Andy
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
<mailto:OpenStack-operators@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators