On 廿十七年二月廿一日 暮 03:11, Tom Fifield wrote:


On 廿十七年二月十四日 暮 04:19, Joshua Hesketh wrote:


On Tue, Feb 14, 2017 at 7:15 PM, Tom Fifield <t...@openstack.org
<mailto:t...@openstack.org>> wrote:

    On 14/02/17 16:11, Joshua Hesketh wrote:

        Hey Tom,

        Where is that script being fired from (a quick grep doesn't find
        it), or
        is it a tool people are using?

        If it's a tool we'd need to make sure whoever is using it gets
a new
        version to rule it out.


    Indeed.


    It's fired from a PHP service on www.openstack.org
    <http://www.openstack.org> itself, which writes to the Member
database:


https://github.com/OpenStackweb/openstack-org/blob/master/auc-metrics/code/services/ActiveModeratorService.php


<https://github.com/OpenStackweb/openstack-org/blob/master/auc-metrics/code/services/ActiveModeratorService.php>




Right. I wonder if somebody could check the logs to see if the process
times out. Sadly looking at that code it looks like any output messages
from the script will be discarded.


... and my patch was deployed, but the site is down today. So, looks
like it wasn't that.

Though, is it staying down for less time? It came back up just now - normally it'd be down for another 45mins.

Interesting traffic spikes at:
http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=2549&rra_id=all

seem to correlate with the outage. Perhaps we can set up some tcpdumps?



    The next step is to update the copy of the script it references:


https://github.com/OpenStackweb/openstack-org/blob/master/auc-metrics/lib/uc-recognition/tools/get_active_moderator.py


<https://github.com/OpenStackweb/openstack-org/blob/master/auc-metrics/lib/uc-recognition/tools/get_active_moderator.py>


    I am not sure if this is in place using git submodules or manually,
    but will figure it out and get that updated.




         - Josh

        On Tue, Feb 14, 2017 at 7:07 PM, Tom Fifield <t...@openstack.org
        <mailto:t...@openstack.org>
        <mailto:t...@openstack.org <mailto:t...@openstack.org>>> wrote:

            On 14/02/17 16:06, Joshua Hesketh wrote:

                Hey,

                I've brought the service back up, but have no new clues
        as to why.


            Cheers.

            Going to try: https://review.openstack.org/#/c/433478/
        <https://review.openstack.org/#/c/433478/>
            <https://review.openstack.org/#/c/433478/
        <https://review.openstack.org/#/c/433478/>>
            to see if this script is culprit.


                - Josh

                On Tue, Feb 14, 2017 at 6:50 PM, Tom Fifield
        <t...@openstack.org <mailto:t...@openstack.org>
                <mailto:t...@openstack.org <mailto:t...@openstack.org>>
                <mailto:t...@openstack.org <mailto:t...@openstack.org>
        <mailto:t...@openstack.org <mailto:t...@openstack.org>>>> wrote:

                    On 10/02/17 22:39, Jeremy Stanley wrote:

                        On 2017-02-10 16:08:51 +0800 (+0800), Tom
        Fifield wrote:
                        [...]

                            Down again, this time with "Network is
        unreachable".

                        [...]

                        I'm not finding any obvious errors on the
server nor
                relevant
                        maintenance notices/trouble tickets from the
service
                provider to
                        explain this. I do see conspicuous gaps in
network
                traffic volume
                        and system load from ~06:45 to ~08:10 UTC
        according to
                cacti:


        http://cacti.openstack.org/?tree_id=1&leaf_id=156
        <http://cacti.openstack.org/?tree_id=1&leaf_id=156>
                <http://cacti.openstack.org/?tree_id=1&leaf_id=156
        <http://cacti.openstack.org/?tree_id=1&leaf_id=156>>

        <http://cacti.openstack.org/?tree_id=1&leaf_id=156
        <http://cacti.openstack.org/?tree_id=1&leaf_id=156>
                <http://cacti.openstack.org/?tree_id=1&leaf_id=156
        <http://cacti.openstack.org/?tree_id=1&leaf_id=156>>>

                        Skipping back through previous days I find some
        similar gaps
                        starting anywhere from 06:30 to 07:00 and ending
        between
                07:00 and
                        08:00 but they don't seem to occur every day and
        I'm not
                having much
                        luck finding a pattern. It _is_ conspicuously
        close to when
                        /etc/cron.daily scripts get fired from the
        crontab so
                might coincide
                        with log rotation/service restarts? The graphs
don't
                show these gaps
                        correlating with any spikes in CPU, memory or
disk
                activity so it
                        doesn't seem to be resource starvation (at least
        not for
                any common
                        resources we're tracking).


                    Indeed. It's down again today during the same
timeslot.

                    Another idea for the cron-based theory:




https://github.com/openstack/uc-recognition/blob/master/tools/get_active_moderator.py


<https://github.com/openstack/uc-recognition/blob/master/tools/get_active_moderator.py>



<https://github.com/openstack/uc-recognition/blob/master/tools/get_active_moderator.py


<https://github.com/openstack/uc-recognition/blob/master/tools/get_active_moderator.py>>




<https://github.com/openstack/uc-recognition/blob/master/tools/get_active_moderator.py


<https://github.com/openstack/uc-recognition/blob/master/tools/get_active_moderator.py>



<https://github.com/openstack/uc-recognition/blob/master/tools/get_active_moderator.py


<https://github.com/openstack/uc-recognition/blob/master/tools/get_active_moderator.py>>>


                    loops through the list of Ask OpenStack users via
        the API on
                a cron
                    running on www.openstack.org
        <http://www.openstack.org> <http://www.openstack.org>
                <http://www.openstack.org>. Not sure
                    when that cron runs, but if it's similar, this could
                potentially be
                    a high-load generator.




                    Regards,


                    Tom


                    _______________________________________________
                    OpenStack-Infra mailing list
                    OpenStack-Infra@lists.openstack.org
        <mailto:OpenStack-Infra@lists.openstack.org>
                <mailto:OpenStack-Infra@lists.openstack.org
        <mailto:OpenStack-Infra@lists.openstack.org>>
                    <mailto:OpenStack-Infra@lists.openstack.org
        <mailto:OpenStack-Infra@lists.openstack.org>
                <mailto:OpenStack-Infra@lists.openstack.org
        <mailto:OpenStack-Infra@lists.openstack.org>>>



http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra>


<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra>>



<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra>


<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra>>>







_______________________________________________
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

_______________________________________________
OpenStack-Infra mailing list
OpenStack-Infra@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Reply via email to