Re: native connector, server problems with "No data received", what could be causing it?

Christopher Schultz Wed, 16 Dec 2020 10:07:21 -0800

Mladen,

On 12/16/20 10:12, Mladen Adamović wrote:

On Wed, Dec 16, 2020 at 3:27 PM Christopher Schultz<ch...@christopherschultz.net <mailto:ch...@christopherschultz.net>> wrote:
     > We have a self-monitoring script which runs on server and when
    the server
     > is not working properly it does a log save and the service restart.

    How do you detect this state? Just make a request and if you get "No
    data received" from curl, you restart the server?
If there is an error code or the specific text doesn't appear on theresponse we monitor the state and do /etc/init.d/tomcat restart.
The full script is:
#!/bin/bash
serverFailure=0
cd /root
rm /root/numbeo_test.out
#wget -t 1 -T 5 --no-proxy --no-cache --cache=off -q'localhost:8080/cost-of-living/city_result.jsp?country=Ireland&city=Dublin'-O /root/numbeo_test.out#curl -L -m 2 -v -o /root/numbeo_test.out --trace curl.log'localhost:8008/cost-of-living/in/Dublin'curl -L -m 2 -v --insecure -o /root/numbeo_test.out --trace curl.log'https://localhost:8181/cost-of-living/in/Dublin<https://localhost:8181/cost-of-living/in/Dublin>'
wgetOutput=$?

grep -q "entries in the past" /root/numbeo_test.out
if [ $? != 0 ]; then
cd /root
rm /root/numbeo_test.out
sleep 10s
#wget -t 2 -T 2 --no-proxy --no-cache --cache=off -q'localhost:8080/cost-of-living/city_result.jsp?country=Ireland&city=Dublin'-O /root/numbeo_test.out #curl -L -m 2 --retry 1 -v -o /root/numbeo_test.out --trace curl.log'localhost:8008/cost-of-living/in/Dublin' curl -L -m 2 -v --insecure -o /root/numbeo_test.out --trace curl.log'https://localhost:8181/cost-of-living/in/Dublin<https://localhost:8181/cost-of-living/in/Dublin>'
   wgetOutput=$?
grep -q "entries in the past" /root/numbeo_test.out

if [ $? != 0 ]; then
#echo 'server is down!';
ps -eo pid,comm | while read pid command
do
    if [[ "$command" = "java" ]]
        then
                echo $pid
                DATE=`date +%Y-%m-%d`
                echo ${wgetOutput} > ~/wget_${DATE}_${pid}.log
cp /root/numbeo_test.out >~/numbeo_test_out_${DATE}_${pid}.log
                jstack -J-d64 -F $pid > ~/jstack_${DATE}_${pid}.log
                iostat > ~/iostat_${DATE}_${pid}.log
                vmstat > ~/vmstat_${DATE}_${pid}.log
                netstat -tnp > ~/netstat_${DATE}_${pid}.log
netstat -anp |grep 'tcp\|udp' | awk '{print $5}' | cut-d: -f1 | sort | uniq -c | sort -n >~/netstat_anp_outline_${DATE}_${pid}.log
                ps aux > ~/ps_aux_${DATE}_${pid}.log
tail -n 5000~glassfish/apache-tomcat-8.5.5/logs/catalina.out >~/catalina_out_${DATE}_${pid}.log
                break
    fi
done
echo 'too many server failures... going to rebootsoftly' >> ~/reboot.log ;
date | mail -s "Numbeo soft reset" mladen.adamo...@gmail.com<mailto:mladen.adamo...@gmail.com>
date >> ~/reboot.log
killall -9 java
/root/fix_letsencrypt_chmod.sh
#/etc/init.d/glassfish start
/etc/init.d/tomcat start
#reboot
fi
fi

That seems a little fragile, but it's your server so I guess you can dowhat you want.

    I see you are using Let's Encrypt. How are you managing the rotating of
    the keys and certificates?


Crontab: 5   1  1   *   *     /root/renew_cert_numbeo.sh
root@condor1796 ~ # cat renew_cert_numbeo.sh
#!/bin/bash

mkdir -p /tmp/letsencrypt/public_html
certbot certonly -n --force-renewal --webroot --webroot-path/tmp/letsencrypt/public_html -d numbeo.com <http://numbeo.com> -dwww.numbeo.com <http://www.numbeo.com> \ -d es.numbeo.com <http://es.numbeo.com> -d pt.numbeo.com<http://pt.numbeo.com> -d fr.numbeo.com <http://fr.numbeo.com> -dru.numbeo.com <http://ru.numbeo.com> -d ja.numbeo.com<http://ja.numbeo.com> -d de.numbeo.com <http://de.numbeo.com> -dnl.numbeo.com <http://nl.numbeo.com> \ -d it.numbeo.com <http://it.numbeo.com> -d zh.numbeo.com<http://zh.numbeo.com> -d ar.numbeo.com <http://ar.numbeo.com> -djobs.numbeo.com <http://jobs.numbeo.com> \ --agree-tos --email mladen.adamo...@gmail.com<mailto:mladen.adamo...@gmail.com>
/root/fix_letsencrypt_chmod.sh
if [ $? != 0 ]; then
date | mail -s "Lets encrypt renew certificate fails for numbeo.com<http://numbeo.com>" mladen.adamo...@gmail.com<mailto:mladen.adamo...@gmail.com>
else
    /etc/init.d/tomcat restart
fi

root@condor1796 ~ # cat fix_letsencrypt_chmod.sh
#!/bin/bash
chmod o+rx /etc/letsencrypt
chmod -R o+rx /etc/letsencrypt/*

root@condor1796 ~ #

I think your scripts will restart Tomcat even when it's not necessary.The $? check before sending the email message looks like it should bechecking the result of the certbot command, but it's checking the resultof the chmod command instead. (Or maybe the result of the .sh script,which will proably be 0.)

     > *What would be the next steps how to identify the problem and perhaps
     > solve it?*
    What have you done so far?


aaah... reading the Tomcat source to try to understand the state of Threads.

    I don't see anything that sticks out in your thread dump.
There are several threads which are trying to get monitor inAprEndpoint$Poller.add and no thread seems to be blocking it. Don't youfind it weird:
root@condor1796 ~ # grep Poller jstack_2020-12-16_31415.log  | grep "Apr"
- org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int)@bci=102, line=1398 (Compiled frame)

I might have found that odd had you posted that in your originalmessage, but you did not.

You need to show the full stack trace for that thread to make itmeaningful. Sockets are added to the poller all the time. It's notunusual to see that happening. It they are getting *stuck*, that wouldbe bad, of course.

Don't you find it weird that all threads are trying to get synchronizedon a Poller instance and no one is in this block or another synchronizedblock/method?

I would find it weird if no threads were making any progress. Lots ofthreads adding sockets to the poller is not out of the ordinary.

If you suspect a bug in Tomcat's socket handling, upgrading to thelatest 8.5.x release and re-trying would be the best move. There havebeen many fixes since your 8.5.5 release which is now 4+ years old.


-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: native connector, server problems with "No data received", what could be causing it?

Reply via email to