i was right to not hold my breath... while my apache changes seem to have helped a bit, things are still slowing down after 10-12 hours.
i have a few other things i can look at, and will get as much done as possible before the weekend. serious troubleshooting will begin anew monday. apologies again, shane On Thu, Mar 21, 2019 at 12:54 PM shane knapp <[email protected]> wrote: > i tweaked some apache settings (MaxClients increased to fix an error i > found buried in the logs, and added 'retry' and 'acquire' to the reverse > proxy settings to hopefully combat the dreaded 502 response), restarted > httpd and things actually seem quite snappy right now! > > i'm not holding my breath, however... only time will tell. > > On Tue, Mar 19, 2019 at 7:18 AM Imran Rashid <[email protected]> wrote: > >> seems wedged again? >> >> sorry for the bad news Shane, thanks for all the work on fixing it >> >> On Mon, Mar 18, 2019 at 4:02 PM shane knapp <[email protected]> wrote: >> >>> ok, i dug through the logs and noticed that rsyslogd was dropping >>> messages to do imuxsock being spammed by postfix... which i then tracked >>> down to our installation of fail2ban being incorrectly configured and >>> attempting to send IP ban/unban status emails to '[email protected]'. >>> >>> since we're a university, and especially one w/a reputation like ours, >>> we are constantly under attack. the logs of the attempted dictionary >>> attacks would astound you in their size and scope. since we have so many >>> ban/unban actions happening for all of these unique IP address, each of >>> which generates an email that was directed to an invalid address, we ended >>> up w/well over 100M of plain-text messages waiting in the mail queue. >>> postfix was continually trying to send these messages, which was causing >>> the system to behave strangely, including breaking rsyslogd. >>> >>> so, i disabled email reports in fail2ban, restarted the impacted >>> services, picked my sysadmin's brain and then purged the mail queue (when >>> was the last time anyone actually used postfix?). jenkins now seems to be >>> behaving (maybe?). >>> >>> i'm not entirely sure that this will fix the strange GUI hangs, but all >>> reports i found on stackoverflow and other sites detail strange system >>> behavior across the board when rsyslogd starts dropping messages. at the >>> very least we won't be (potentially) losing system-level log messages >>> anymore, which might actually help me track down what's happening if >>> jenkins gets wedged again. >>> >>> and finally, the obligatory IT Crowd clip: >>> https://www.youtube.com/watch?v=5UT8RkSmN4k >>> >>> shane (who expects jenkins to crash within 5 minutes of this email going >>> out) >>> >>> On Fri, Mar 15, 2019 at 8:22 PM Sean Owen <[email protected]> wrote: >>> >>>> It's not responding again. Is there any way to kick it harder? I know >>>> it's well understood but this means not much can be merged in Spark >>>> >>>> On Fri, Mar 15, 2019 at 12:08 PM shane knapp <[email protected]> >>>> wrote: >>>> > >>>> > well, that box rebooted in record time! we're back up and building. >>>> > >>>> > and as always, i'll keep a close eye on things today... jenkins >>>> usually works great, until it doesn't. :\ >>>> > >>>> > On Fri, Mar 15, 2019 at 9:52 AM shane knapp <[email protected]> >>>> wrote: >>>> >> >>>> >> as some of you may have noticed, jenkins got itself in a bad state >>>> multiple times over the past couple of weeks. usually restarting the >>>> service is sufficient, but it appears that i need to hit it w/the reboot >>>> hammer. >>>> >> >>>> >> jenkins will be down for the next 20-30 minutes as the node reboots >>>> and jenkins spins back up. i'll reply here w/any updates. >>>> >> >>>> >> shane >>>> >> -- >>>> >> Shane Knapp >>>> >> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>> >> https://rise.cs.berkeley.edu >>>> > >>>> > >>>> > >>>> > -- >>>> > Shane Knapp >>>> > UC Berkeley EECS Research / RISELab Staff Technical Lead >>>> > https://rise.cs.berkeley.edu >>>> >>> >>> >>> -- >>> Shane Knapp >>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>> https://rise.cs.berkeley.edu >>> >> > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu > -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu
