i was right to not hold my breath...  while my apache changes seem to have
helped a bit, things are still slowing down after 10-12 hours.

i have a few other things i can look at, and will get as much done as
possible before the weekend.  serious troubleshooting will begin anew
monday.

apologies again,

shane

On Thu, Mar 21, 2019 at 12:54 PM shane knapp <[email protected]> wrote:

> i tweaked some apache settings (MaxClients increased to fix an error i
> found buried in the logs, and added 'retry' and 'acquire' to the reverse
> proxy settings to hopefully combat the dreaded 502 response), restarted
> httpd and things actually seem quite snappy right now!
>
> i'm not holding my breath, however...  only time will tell.
>
> On Tue, Mar 19, 2019 at 7:18 AM Imran Rashid <[email protected]> wrote:
>
>> seems wedged again?
>>
>> sorry for the bad news Shane, thanks for all the work on fixing it
>>
>> On Mon, Mar 18, 2019 at 4:02 PM shane knapp <[email protected]> wrote:
>>
>>> ok, i dug through the logs and noticed that rsyslogd was dropping
>>> messages to do imuxsock being spammed by postfix...  which i then tracked
>>> down to our installation of fail2ban being incorrectly configured and
>>> attempting to send IP ban/unban status emails to '[email protected]'.
>>>
>>> since we're a university, and especially one w/a reputation like ours,
>>> we are constantly under attack.  the logs of the attempted dictionary
>>> attacks would astound you in their size and scope.  since we have so many
>>> ban/unban actions happening for all of these unique IP address, each of
>>> which generates an email that was directed to an invalid address, we ended
>>> up w/well over 100M of plain-text messages waiting in the mail queue.
>>> postfix was continually trying to send these messages, which was causing
>>> the system to behave strangely, including breaking rsyslogd.
>>>
>>> so, i disabled email reports in fail2ban, restarted the impacted
>>> services, picked my sysadmin's brain and then purged the mail queue (when
>>> was the last time anyone actually used postfix?).  jenkins now seems to be
>>> behaving (maybe?).
>>>
>>> i'm not entirely sure that this will fix the strange GUI hangs, but all
>>> reports i found on stackoverflow and other sites detail strange system
>>> behavior across the board when rsyslogd starts dropping messages.  at the
>>> very least we won't be (potentially) losing system-level log messages
>>> anymore, which might actually help me track down what's happening if
>>> jenkins gets wedged again.
>>>
>>> and finally, the obligatory IT Crowd clip:
>>> https://www.youtube.com/watch?v=5UT8RkSmN4k
>>>
>>> shane (who expects jenkins to crash within 5 minutes of this email going
>>> out)
>>>
>>> On Fri, Mar 15, 2019 at 8:22 PM Sean Owen <[email protected]> wrote:
>>>
>>>> It's not responding again. Is there any way to kick it harder? I know
>>>> it's well understood but this means not much can be merged in Spark
>>>>
>>>> On Fri, Mar 15, 2019 at 12:08 PM shane knapp <[email protected]>
>>>> wrote:
>>>> >
>>>> > well, that box rebooted in record time!  we're back up and building.
>>>> >
>>>> > and as always, i'll keep a close eye on things today...  jenkins
>>>> usually works great, until it doesn't.  :\
>>>> >
>>>> > On Fri, Mar 15, 2019 at 9:52 AM shane knapp <[email protected]>
>>>> wrote:
>>>> >>
>>>> >> as some of you may have noticed, jenkins got itself in a bad state
>>>> multiple times over the past couple of weeks.  usually restarting the
>>>> service is sufficient, but it appears that i need to hit it w/the reboot
>>>> hammer.
>>>> >>
>>>> >> jenkins will be down for the next 20-30 minutes as the node reboots
>>>> and jenkins spins back up.  i'll reply here w/any updates.
>>>> >>
>>>> >> shane
>>>> >> --
>>>> >> Shane Knapp
>>>> >> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> >> https://rise.cs.berkeley.edu
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Shane Knapp
>>>> > UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> > https://rise.cs.berkeley.edu
>>>>
>>>
>>>
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Reply via email to