On Tue, Aug 30, 2022, 2:13 PM Casey Deccio <ca...@deccio.net> wrote: > Hi all, > > I am having trouble tracking down a bug in my monitoring setup. It all > happened when I upgraded the monitored host (host B in my example below) to > bullseye. Note that Host A is also running bullseye, but the problem > didn't show itself until Host B was upgraded. > > Here is the setup: > > Host A (monitoring): > Installed: nagios4, nrpe-ng > IP address: 192.0.2.1 > > Host B (monitored): > Installed: nrpe-ng, monitoring-plugins-standard, bind9-dnsutils > IP address: 192.0.2.2 > > Host C (monitored through host B): > Installed: bind9 > IP address: 192.0.2.3 > Configured to answer authoritatively for example.com on port 53. > > nrpe > over HTTPs DNS > Host A ------------------> Host B -------------> Host C >
When you run check_dns by hand on Host B, you don't say who you are logged-in as. That can make a difference. Nagios runs its scripts in a known environment which may be different than you expect. On Host B, I run the following: > sudo /usr/bin/python3 /usr/sbin/nrpe-ng --debug -f --config > /etc/nagios/nrpe-ng.cfg > > While that is running, I run the following on Host A: > /usr/lib/nagios/plugins/check_nrpe_ng -H 192.0.2.2 -c check_dns -a > example.com 192.0.2.3 0.1 1.0 > > The result of running the command on Host A is: > DNS CRITICAL - '/usr/bin/nslookup -sil' msg parsing exited with no address > > On Host B, I see the following debug output: > 200 POST /v1/check/check_dns (192.0.2.1) 78.05ms > Executing: /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3 > -A -w 0.1 -c 1.0 > > When I run this exact command on Host B, I get: > $ /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3 -A -w 0.1 > -c 1.0 > DNS OK: 0.070 seconds response time. example.com returns > 192.0.2.10,2001:db8::10|time=0.069825s;0.100000;1.000000;0.000000 > > Looks good! When I run nslookup (run by check_dns), it looks good too: > $ /usr/bin/nslookup -sil example.com 192.0.2.3 > Server: 192.0.2.3 > Address: 192.0.2.3#53 > > Name: example.com > Address: 192.0.2.10 > Name: example.com > Address: 2001:db8::10 > > After rerunning nrpe-ng with strace -f, I see something: > > [pid 1183842] write(2, "nslookup: ./src/unix/core.c:570:"..., 83) = 83 > ... > [pid 1183841] read(4, "nslookup: ./src/unix/core.c:570:"..., 4096) = 83 > > So it appears that the nslookup process is reporting an error. But I > cannot reproduce it outside of nrpe-ng. > > Any suggestions? > > Casey >