Mladen,
On 12/16/20 10:12, Mladen Adamović wrote:
On Wed, Dec 16, 2020 at 3:27 PM Christopher Schultz
<ch...@christopherschultz.net <mailto:ch...@christopherschultz.net>> wrote:
> We have a self-monitoring script which runs on server and when
the server
> is not working properly it does a log save and the service restart.
How do you detect this state? Just make a request and if you get "No
data received" from curl, you restart the server?
If there is an error code or the specific text doesn't appear on the
response we monitor the state and do /etc/init.d/tomcat restart.
The full script is:
#!/bin/bash
serverFailure=0
cd /root
rm /root/numbeo_test.out
#wget -t 1 -T 5 --no-proxy --no-cache --cache=off -q
'localhost:8080/cost-of-living/city_result.jsp?country=Ireland&city=Dublin'
-O /root/numbeo_test.out
#curl -L -m 2 -v -o /root/numbeo_test.out --trace curl.log
'localhost:8008/cost-of-living/in/Dublin'
curl -L -m 2 -v --insecure -o /root/numbeo_test.out --trace curl.log
'https://localhost:8181/cost-of-living/in/Dublin
<https://localhost:8181/cost-of-living/in/Dublin>'
wgetOutput=$?
grep -q "entries in the past" /root/numbeo_test.out
if [ $? != 0 ]; then
cd /root
rm /root/numbeo_test.out
sleep 10s
#wget -t 2 -T 2 --no-proxy --no-cache --cache=off -q
'localhost:8080/cost-of-living/city_result.jsp?country=Ireland&city=Dublin'
-O /root/numbeo_test.out
#curl -L -m 2 --retry 1 -v -o /root/numbeo_test.out --trace curl.log
'localhost:8008/cost-of-living/in/Dublin'
curl -L -m 2 -v --insecure -o /root/numbeo_test.out --trace curl.log
'https://localhost:8181/cost-of-living/in/Dublin
<https://localhost:8181/cost-of-living/in/Dublin>'
wgetOutput=$?
grep -q "entries in the past" /root/numbeo_test.out
if [ $? != 0 ]; then
#echo 'server is down!';
ps -eo pid,comm | while read pid command
do
if [[ "$command" = "java" ]]
then
echo $pid
DATE=`date +%Y-%m-%d`
echo ${wgetOutput} > ~/wget_${DATE}_${pid}.log
cp /root/numbeo_test.out >
~/numbeo_test_out_${DATE}_${pid}.log
jstack -J-d64 -F $pid > ~/jstack_${DATE}_${pid}.log
iostat > ~/iostat_${DATE}_${pid}.log
vmstat > ~/vmstat_${DATE}_${pid}.log
netstat -tnp > ~/netstat_${DATE}_${pid}.log
netstat -anp |grep 'tcp\|udp' | awk '{print $5}' | cut
-d: -f1 | sort | uniq -c | sort -n >
~/netstat_anp_outline_${DATE}_${pid}.log
ps aux > ~/ps_aux_${DATE}_${pid}.log
tail -n 5000
~glassfish/apache-tomcat-8.5.5/logs/catalina.out >
~/catalina_out_${DATE}_${pid}.log
break
fi
done
echo 'too many server failures... going to rebootsoftly' >> ~/reboot.log ;
date | mail -s "Numbeo soft reset" mladen.adamo...@gmail.com
<mailto:mladen.adamo...@gmail.com>
date >> ~/reboot.log
killall -9 java
/root/fix_letsencrypt_chmod.sh
#/etc/init.d/glassfish start
/etc/init.d/tomcat start
#reboot
fi
fi
That seems a little fragile, but it's your server so I guess you can do
what you want.
I see you are using Let's Encrypt. How are you managing the rotating of
the keys and certificates?
Crontab: 5 1 1 * * /root/renew_cert_numbeo.sh
root@condor1796 ~ # cat renew_cert_numbeo.sh
#!/bin/bash
mkdir -p /tmp/letsencrypt/public_html
certbot certonly -n --force-renewal --webroot --webroot-path
/tmp/letsencrypt/public_html -d numbeo.com <http://numbeo.com> -d
www.numbeo.com <http://www.numbeo.com> \
-d es.numbeo.com <http://es.numbeo.com> -d pt.numbeo.com
<http://pt.numbeo.com> -d fr.numbeo.com <http://fr.numbeo.com> -d
ru.numbeo.com <http://ru.numbeo.com> -d ja.numbeo.com
<http://ja.numbeo.com> -d de.numbeo.com <http://de.numbeo.com> -d
nl.numbeo.com <http://nl.numbeo.com> \
-d it.numbeo.com <http://it.numbeo.com> -d zh.numbeo.com
<http://zh.numbeo.com> -d ar.numbeo.com <http://ar.numbeo.com> -d
jobs.numbeo.com <http://jobs.numbeo.com> \
--agree-tos --email mladen.adamo...@gmail.com
<mailto:mladen.adamo...@gmail.com>
/root/fix_letsencrypt_chmod.sh
if [ $? != 0 ]; then
date | mail -s "Lets encrypt renew certificate fails for numbeo.com
<http://numbeo.com>" mladen.adamo...@gmail.com
<mailto:mladen.adamo...@gmail.com>
else
/etc/init.d/tomcat restart
fi
root@condor1796 ~ # cat fix_letsencrypt_chmod.sh
#!/bin/bash
chmod o+rx /etc/letsencrypt
chmod -R o+rx /etc/letsencrypt/*
root@condor1796 ~ #
I think your scripts will restart Tomcat even when it's not necessary.
The $? check before sending the email message looks like it should be
checking the result of the certbot command, but it's checking the result
of the chmod command instead. (Or maybe the result of the .sh script,
which will proably be 0.)
> *What would be the next steps how to identify the problem and perhaps
> solve it?*
What have you done so far?
aaah... reading the Tomcat source to try to understand the state of Threads.
I don't see anything that sticks out in your thread dump.
There are several threads which are trying to get monitor in
AprEndpoint$Poller.add and no thread seems to be blocking it. Don't you
find it weird:
root@condor1796 ~ # grep Poller jstack_2020-12-16_31415.log | grep "Apr"
- org.apache.tomcat.util.net.AprEndpoint$Poller.add(long, long, int)
@bci=102, line=1398 (Compiled frame)
I might have found that odd had you posted that in your original
message, but you did not.
You need to show the full stack trace for that thread to make it
meaningful. Sockets are added to the poller all the time. It's not
unusual to see that happening. It they are getting *stuck*, that would
be bad, of course.
Don't you find it weird that all threads are trying to get synchronized
on a Poller instance and no one is in this block or another synchronized
block/method?
I would find it weird if no threads were making any progress. Lots of
threads adding sockets to the poller is not out of the ordinary.
If you suspect a bug in Tomcat's socket handling, upgrading to the
latest 8.5.x release and re-trying would be the best move. There have
been many fixes since your 8.5.5 release which is now 4+ years old.
-chris
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org