On Wed, Dec 16, 2020 at 3:27 PM Christopher Schultz <
ch...@christopherschultz.net> wrote:

> > We have a self-monitoring script which runs on server and when the server
> > is not working properly it does a log save and the service restart.
>
> How do you detect this state? Just make a request and if you get "No
> data received" from curl, you restart the server?
>

If there is an error code or the specific text doesn't appear on the
response we monitor the state and do /etc/init.d/tomcat restart.
The full script is:
#!/bin/bash
serverFailure=0
cd /root
rm /root/numbeo_test.out
#wget -t 1 -T 5 --no-proxy --no-cache --cache=off -q
'localhost:8080/cost-of-living/city_result.jsp?country=Ireland&city=Dublin'
-O /root/numbeo_test.out
#curl -L -m 2 -v  -o /root/numbeo_test.out --trace curl.log
'localhost:8008/cost-of-living/in/Dublin'
curl -L -m 2 -v --insecure -o /root/numbeo_test.out --trace curl.log '
https://localhost:8181/cost-of-living/in/Dublin'
wgetOutput=$?

grep -q "entries in the past" /root/numbeo_test.out
if [ $? != 0 ]; then
cd /root
rm /root/numbeo_test.out
sleep 10s
#wget -t 2 -T 2 --no-proxy --no-cache --cache=off -q
'localhost:8080/cost-of-living/city_result.jsp?country=Ireland&city=Dublin'
-O /root/numbeo_test.out
  #curl -L -m 2 --retry 1 -v  -o /root/numbeo_test.out --trace curl.log
'localhost:8008/cost-of-living/in/Dublin'
  curl -L -m 2 -v --insecure -o /root/numbeo_test.out --trace curl.log '
https://localhost:8181/cost-of-living/in/Dublin'
  wgetOutput=$?
grep -q "entries in the past" /root/numbeo_test.out

if [ $? != 0 ]; then
#echo 'server is down!';
ps -eo pid,comm | while read pid command
do
   if [[ "$command" = "java" ]]
       then
               echo $pid
               DATE=`date +%Y-%m-%d`
               echo ${wgetOutput} > ~/wget_${DATE}_${pid}.log
               cp /root/numbeo_test.out >
~/numbeo_test_out_${DATE}_${pid}.log
               jstack -J-d64 -F $pid > ~/jstack_${DATE}_${pid}.log
               iostat > ~/iostat_${DATE}_${pid}.log
               vmstat > ~/vmstat_${DATE}_${pid}.log
               netstat -tnp > ~/netstat_${DATE}_${pid}.log
               netstat -anp |grep 'tcp\|udp' | awk '{print $5}' | cut -d:
-f1 | sort | uniq -c | sort -n > ~/netstat_anp_outline_${DATE}_${pid}.log
               ps aux > ~/ps_aux_${DATE}_${pid}.log
               tail -n 5000
~glassfish/apache-tomcat-8.5.5/logs/catalina.out >
~/catalina_out_${DATE}_${pid}.log
               break
   fi
done
echo 'too many server failures... going to rebootsoftly' >> ~/reboot.log ;
date | mail -s "Numbeo soft reset" mladen.adamo...@gmail.com
date >> ~/reboot.log
killall -9 java
/root/fix_letsencrypt_chmod.sh
#/etc/init.d/glassfish start
/etc/init.d/tomcat start
#reboot
fi
fi


I see you are using Let's Encrypt. How are you managing the rotating of
> the keys and certificates?
>

Crontab: 5   1  1   *   *     /root/renew_cert_numbeo.sh
root@condor1796 ~ # cat renew_cert_numbeo.sh
#!/bin/bash

mkdir -p /tmp/letsencrypt/public_html
certbot certonly -n --force-renewal --webroot --webroot-path
/tmp/letsencrypt/public_html -d numbeo.com -d www.numbeo.com \
        -d es.numbeo.com -d  pt.numbeo.com -d  fr.numbeo.com -d
ru.numbeo.com -d  ja.numbeo.com -d  de.numbeo.com -d nl.numbeo.com \
        -d it.numbeo.com -d zh.numbeo.com -d ar.numbeo.com -d
jobs.numbeo.com \
     --agree-tos --email mladen.adamo...@gmail.com

/root/fix_letsencrypt_chmod.sh
if [ $? != 0 ]; then
   date | mail -s "Lets encrypt renew certificate fails for numbeo.com"
mladen.adamo...@gmail.com
else
   /etc/init.d/tomcat restart
fi

root@condor1796 ~ # cat fix_letsencrypt_chmod.sh
#!/bin/bash
chmod o+rx /etc/letsencrypt
chmod -R o+rx /etc/letsencrypt/*

root@condor1796 ~ #



> > *What would be the next steps how to identify the problem and perhaps
> > solve it?*
> What have you done so far?
>

aaah... reading the Tomcat source to try to understand the state of Threads.

I don't see anything that sticks out in your thread dump.
>

There are several threads which are trying to get monitor in
AprEndpoint$Poller.add and no thread seems to be blocking it. Don't you
find it weird:

root@condor1796 ~ # grep Poller jstack_2020-12-16_31415.log  | grep "Apr"
 - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int)
@bci=102, line=1398 (Compiled frame)
 -
org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller,
long, long, int) @bci=5, line=1157 (Compiled frame)
 - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int)
@bci=102, line=1398 (Compiled frame)
 -
org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller,
long, long, int) @bci=5, line=1157 (Compiled frame)
 - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int)
@bci=102, line=1398 (Compiled frame)
 -
org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller,
long, long, int) @bci=5, line=1157 (Compiled frame)
 - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int)
@bci=102, line=1398 (Compiled frame)
 -
org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller,
long, long, int) @bci=5, line=1157 (Compiled frame)
 - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int)
@bci=102, line=1398 (Compiled frame)
 -
org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller,
long, long, int) @bci=5, line=1157 (Compiled frame)
 - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int)
@bci=102, line=1398 (Compiled frame)
 -
org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller,
long, long, int) @bci=5, line=1157 (Compiled frame)
 - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int)
@bci=102, line=1398 (Compiled frame)
 -
org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller,
long, long, int) @bci=5, line=1157 (Compiled frame)
 - org.apache.tomcat.util.net.AprEndpoint$Poller.run() @bci=1713, line=1799
(Compiled frame; information may be imprecise)


Thread 22685: (state = BLOCKED)
 - org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int)
@bci=102, line=1398 (Compiled frame)
 -
org.apache.tomcat.util.net.AprEndpoint$Poller.access$500(org.apache.tomcat.util.net.AprEndpoint$Poller,
long, long, int) @bci=5, line=1157 (Compiled frame)
 -
org.apache.tomcat.util.net.AprEndpoint$AprSocketWrapper.registerReadInterest()
@bci=48, line=2560 (Compiled frame)
 -
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(org.apache.tomcat.util.net.SocketWrapperBase,
org.apache.tomcat.util.net.SocketEvent) @bci=643, line=870 (Compiled frame)
 - org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun() @bci=15,
line=2241 (Compiled frame)
 - org.apache.tomcat.util.net.SocketProcessorBase.run() @bci=21, line=49
(Compiled frame)
 -
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1142 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=617
(Interpreted frame)
 - org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run() @bci=4,
line=61 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Compiled frame)

The Poller add(...) method is as follows:

        private void add(long socket, long timeout, int flags) {
            if (log.isDebugEnabled()) {
                String msg = sm.getString("endpoint.debug.pollerAdd",
                        Long.valueOf(socket), Long.valueOf(timeout),
                        Integer.valueOf(flags));
                if (log.isTraceEnabled()) {
                    log.trace(msg, new Exception());
                } else {
                    log.debug(msg);
                }
            }
            if (timeout <= 0) {
                // Always put a timeout in
                timeout = Integer.MAX_VALUE;
            }
            synchronized (this) {
                // Add socket to the list. Newly added sockets will wait
                // at most for pollTime before being polled.
                if (addList.add(socket, timeout, flags)) {
                    this.notify();
                }
            }
        }


Don't you find it weird that all threads are trying to get synchronized on
a Poller instance and no one is in this block or another synchronized
block/method?





>
> > Attaching to process ID 27753, please wait...
> > Debugger attached successfully.
> > Server compiler detected.
> > JVM version is 25.101-b13
> > Deadlock Detection:
>
> That JVM seems fairly old, too. Consider upgrading to latest Java 8 VM
> (or beyond, if appropriate).
>
> -chris
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>

Reply via email to