On Tue, Aug 30, 2022, 2:13 PM Casey Deccio <ca...@deccio.net> wrote:

> Hi all,
>
> I am having trouble tracking down a bug in my monitoring setup.  It all
> happened when I upgraded the monitored host (host B in my example below) to
> bullseye.  Note that Host A is also running bullseye, but the problem
> didn't show itself until Host B was upgraded.
>
> Here is the setup:
>
> Host A (monitoring):
> Installed: nagios4, nrpe-ng
> IP address: 192.0.2.1
>
> Host B (monitored):
> Installed: nrpe-ng, monitoring-plugins-standard, bind9-dnsutils
> IP address: 192.0.2.2
>
> Host C (monitored through host B):
> Installed: bind9
> IP address: 192.0.2.3
> Configured to answer authoritatively for example.com on port 53.
>
>                  nrpe
>             over HTTPs                      DNS
> Host A ------------------> Host B -------------> Host C
>

When you run check_dns by hand on Host B, you don't say who you are
logged-in as. That can make a difference. Nagios runs its scripts in a
known environment which may be different than you expect.

On Host B, I run the following:
> sudo /usr/bin/python3 /usr/sbin/nrpe-ng --debug -f --config
> /etc/nagios/nrpe-ng.cfg
>
> While that is running, I run the following on Host A:
> /usr/lib/nagios/plugins/check_nrpe_ng -H 192.0.2.2 -c check_dns -a
> example.com 192.0.2.3 0.1 1.0
>
> The result of running the command on Host A is:
> DNS CRITICAL - '/usr/bin/nslookup -sil' msg parsing exited with no address
>
> On Host B, I see the following debug output:
> 200 POST /v1/check/check_dns (192.0.2.1) 78.05ms
> Executing: /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3
> -A -w 0.1 -c 1.0
>
> When I run this exact command on Host B, I get:
> $ /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3 -A -w 0.1
> -c 1.0
> DNS OK: 0.070 seconds response time. example.com returns
> 192.0.2.10,2001:db8::10|time=0.069825s;0.100000;1.000000;0.000000
>
> Looks good!  When I run nslookup (run by check_dns), it looks good too:
> $ /usr/bin/nslookup -sil example.com 192.0.2.3
> Server: 192.0.2.3
> Address: 192.0.2.3#53
>
> Name: example.com
> Address: 192.0.2.10
> Name: example.com
> Address: 2001:db8::10
>
> After rerunning nrpe-ng with strace -f, I see something:
>
> [pid 1183842] write(2, "nslookup: ./src/unix/core.c:570:"..., 83) = 83
> ...
> [pid 1183841] read(4, "nslookup: ./src/unix/core.c:570:"..., 4096) = 83
>
> So it appears that the nslookup process is reporting an error.  But I
> cannot reproduce it outside of nrpe-ng.
>
> Any suggestions?
>
> Casey
>

Reply via email to