On Wed, Dec 16, 2020 at 3:27 PM Christopher Schultz < ch...@christopherschultz.net> wrote:
> > We have a self-monitoring script which runs on server and when the server > > is not working properly it does a log save and the service restart. > > How do you detect this state? Just make a request and if you get "No > data received" from curl, you restart the server? > If there is an error code or the specific text doesn't appear on the response we monitor the state and do /etc/init.d/tomcat restart. The full script is: #!/bin/bash serverFailure=0 cd /root rm /root/numbeo_test.out #wget -t 1 -T 5 --no-proxy --no-cache --cache=off -q 'localhost:8080/cost-of-living/city_result.jsp?country=Ireland&city=Dublin' -O /root/numbeo_test.out #curl -L -m 2 -v -o /root/numbeo_test.out --trace curl.log 'localhost:8008/cost-of-living/in/Dublin' curl -L -m 2 -v --insecure -o /root/numbeo_test.out --trace curl.log ' https://localhost:8181/cost-of-living/in/Dublin' wgetOutput=$? grep -q "entries in the past" /root/numbeo_test.out if [ $? != 0 ]; then cd /root rm /root/numbeo_test.out sleep 10s #wget -t 2 -T 2 --no-proxy --no-cache --cache=off -q 'localhost:8080/cost-of-living/city_result.jsp?country=Ireland&city=Dublin' -O /root/numbeo_test.out #curl -L -m 2 --retry 1 -v -o /root/numbeo_test.out --trace curl.log 'localhost:8008/cost-of-living/in/Dublin' curl -L -m 2 -v --insecure -o /root/numbeo_test.out --trace curl.log ' https://localhost:8181/cost-of-living/in/Dublin' wgetOutput=$? grep -q "entries in the past" /root/numbeo_test.out if [ $? != 0 ]; then #echo 'server is down!'; ps -eo pid,comm | while read pid command do if [[ "$command" = "java" ]] then echo $pid DATE=`date +%Y-%m-%d` echo ${wgetOutput} > ~/wget_${DATE}_${pid}.log cp /root/numbeo_test.out > ~/numbeo_test_out_${DATE}_${pid}.log jstack -J-d64 -F $pid > ~/jstack_${DATE}_${pid}.log iostat > ~/iostat_${DATE}_${pid}.log vmstat > ~/vmstat_${DATE}_${pid}.log netstat -tnp > ~/netstat_${DATE}_${pid}.log netstat -anp |grep 'tcp\|udp' | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n > ~/netstat_anp_outline_${DATE}_${pid}.log ps aux > ~/ps_aux_${DATE}_${pid}.log tail -n 5000 ~glassfish/apache-tomcat-8.5.5/logs/catalina.out > ~/catalina_out_${DATE}_${pid}.log break fi done echo 'too many server failures... going to rebootsoftly' >> ~/reboot.log ; date | mail -s "Numbeo soft reset" mladen.adamo...@gmail.com date >> ~/reboot.log killall -9 java /root/fix_letsencrypt_chmod.sh #/etc/init.d/glassfish start /etc/init.d/tomcat start #reboot fi fi I see you are using Let's Encrypt. How are you managing the rotating of > the keys and certificates? > Crontab: 5 1 1 * * /root/renew_cert_numbeo.sh root@condor1796 ~ # cat renew_cert_numbeo.sh #!/bin/bash mkdir -p /tmp/letsencrypt/public_html certbot certonly -n --force-renewal --webroot --webroot-path /tmp/letsencrypt/public_html -d numbeo.com -d www.numbeo.com \ -d es.numbeo.com -d pt.numbeo.com -d fr.numbeo.com -d ru.numbeo.com -d ja.numbeo.com -d de.numbeo.com -d nl.numbeo.com \ -d it.numbeo.com -d zh.numbeo.com -d ar.numbeo.com -d jobs.numbeo.com \ --agree-tos --email mladen.adamo...@gmail.com /root/fix_letsencrypt_chmod.sh if [ $? != 0 ]; then date | mail -s "Lets encrypt renew certificate fails for numbeo.com" mladen.adamo...@gmail.com else /etc/init.d/tomcat restart fi root@condor1796 ~ # cat fix_letsencrypt_chmod.sh #!/bin/bash chmod o+rx /etc/letsencrypt chmod -R o+rx /etc/letsencrypt/* root@condor1796 ~ # > > *What would be the next steps how to identify the problem and perhaps > > solve it?* > What have you done so far? > aaah... reading the Tomcat source to try to understand the state of Threads. I don't see anything that sticks out in your thread dump. > There are several threads which are trying to get monitor in AprEndpoint$Poller.add and no thread seems to be blocking it. Don't you find it weird: root@condor1796 ~ # grep Poller jstack_2020-12-16_31415.log | grep "Apr" - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int) @bci=102, line=1398 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller, long, long, int) @bci=5, line=1157 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int) @bci=102, line=1398 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller, long, long, int) @bci=5, line=1157 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int) @bci=102, line=1398 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller, long, long, int) @bci=5, line=1157 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int) @bci=102, line=1398 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller, long, long, int) @bci=5, line=1157 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int) @bci=102, line=1398 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller, long, long, int) @bci=5, line=1157 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int) @bci=102, line=1398 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller, long, long, int) @bci=5, line=1157 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int) @bci=102, line=1398 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller, long, long, int) @bci=5, line=1157 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.run() @bci=1713, line=1799 (Compiled frame; information may be imprecise) Thread 22685: (state = BLOCKED) - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int) @bci=102, line=1398 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller, long, long, int) @bci=5, line=1157 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$AprSocketWrapper.registerReadInterest() @bci=48, line=2560 (Compiled frame) - org.apache.coyote.AbstractProtocol$ConnectionHandler.process(org.apache.tomcat.util.net.SocketWrapperBase, org.apache.tomcat.util.net.SocketEvent) @bci=643, line=870 (Compiled frame) - org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun() @bci=15, line=2241 (Compiled frame) - org.apache.tomcat.util.net.SocketProcessorBase.run() @bci=21, line=49 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1142 (Compiled frame) - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617 (Interpreted frame) - org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run() @bci=4, line=61 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=745 (Compiled frame) The Poller add(...) method is as follows: private void add(long socket, long timeout, int flags) { if (log.isDebugEnabled()) { String msg = sm.getString("endpoint.debug.pollerAdd", Long.valueOf(socket), Long.valueOf(timeout), Integer.valueOf(flags)); if (log.isTraceEnabled()) { log.trace(msg, new Exception()); } else { log.debug(msg); } } if (timeout <= 0) { // Always put a timeout in timeout = Integer.MAX_VALUE; } synchronized (this) { // Add socket to the list. Newly added sockets will wait // at most for pollTime before being polled. if (addList.add(socket, timeout, flags)) { this.notify(); } } } Don't you find it weird that all threads are trying to get synchronized on a Poller instance and no one is in this block or another synchronized block/method? > > > Attaching to process ID 27753, please wait... > > Debugger attached successfully. > > Server compiler detected. > > JVM version is 25.101-b13 > > Deadlock Detection: > > That JVM seems fairly old, too. Consider upgrading to latest Java 8 VM > (or beyond, if appropriate). > > -chris > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >