That's a good question and I'm not really sure the historical reasons as to why they are not, maybe someone with more historical wisdom will chime in.

I know that I put up https://review.openstack.org/#/c/12759/ many years ago (commentary there may be useful in historical investigation)...

Andy Botting wrote:
Thanks to Simon, Josh and Kris who replied to my last email about the
healthcheck middlewear - these are now working well for us.

I'm sure there are plenty of operators, like us, who didn't know this
existed.

Is there any reason why they're not enabled by default?

cheers,
Andy

On 30 April 2016 at 11:52, Joshua Harlow <harlo...@fastmail.com
<mailto:harlo...@fastmail.com>> wrote:

    This can help u more easily view what the healthcheck middleware can
    also show (especially in detailed mode); it can show thread stacks
    and such which can be useful for debugging stuck servers and such
    (similar in concept to apache mod_status).

    https://review.openstack.org/#/c/311482/

    Run the above review like:

    $ python oslo_middleware/healthcheck/ -p 8000

    Then open a browser to http://127.0.0.1:8000/ (or other port).

    -Josh


    Joshua Harlow wrote:

        Yup, that healthcheck middleware was made more advanced by me,

        If u need to do anything special with it, let me know and I can help
        make that possible (or at least instruct what might need changed
        to do
        that).

        Simon Pasquier wrote:

            Hi,

            On Thu, Apr 28, 2016 at 5:13 AM, Andy Botting
            <a...@andybotting.com <mailto:a...@andybotting.com>
            <mailto:a...@andybotting.com <mailto:a...@andybotting.com>>>
            wrote:

            We're running our services clustered behind an F5
            loadbalancer in
            production, and haproxy in our testing environment. This
            setup works
            quite well for us, but I'm not that happy with testing the
            health of
            our endpoints.

            We're currently calling basic URLs like / or /v2 etc and some
            services return a 200, some return other codes like 401. Our
            healthcheck test simply checks against whatever the http code
            returns. This works OK and does catch basic service failure.

            Our test environment is on flaky hardware and often fails in
            strange
            ways and sometimes the port is open and basic URLs work, but
            actually doing real API calls fail and timeout, so our
            checks fall
            down here.

            In a previous role I had, the developers added a url (e.g.
            /healthcheck) to each web application which went through and
            tested
            things like the db connection was OK, memcached was
            accessible, etc
            and returned a 200. This worked out really great for
            operations. I
            haven't seen anything like this for OpenStack.


            There's a healthcheck oslo.middleware plugin [1] available.
            So you could
            possibly configure the service pipeline to include this
            except it won't
            exercise the db connection, RabbitMQ connection, and so on.
            But it would
            help if you want to kick out a service instance from the
            load-balancer
            without stopping the service completely [2].

            [1]
            
http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html

            [2]
            
http://docs.openstack.org/developer/oslo.middleware/healthcheck_plugins.html#disable-by-file


            I'm wondering how everyone else does healthchecking of their
            clustered services, and whether or not they think adding a
            dedicated
            heathcheck URL would be beneficial?


             From what I can tell, people are doing the same thing as
            you do: check
            that a well-known location ('/', '/v2' or else) returns the
            expected
            code and hope that it will work for real user requests too.

            Simon


            We do use scripts similar to ones in the
            osops-tools-monitoring in
            Nagios which help with more complex testing, but I'm thinking of
            something more lightweight specifically for setting up on
            loadbalancers.

            cheers,
            Andy

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to