I am trying to track down a bug. I think it is in nslookup (which is why I'm
asking here), but there are so many pieces required to reproduce it that I
cannot tell for sure. Let me explain my setup:
All hosts are running Debian bullseye. None of the problems happened *until* I
upgraded from buster.
Host A (monitoring):
- Installed: nagios4 (4.4.6-4), nrpe-ng (0.2.0-1)
- IP address: 192.0.2.1
Host B (monitored):
- Installed: nrpe-ng (0.2.0-1), monitoring-plugins-standard (2.3.1-1),
bind9-dnsutils (9.16.27-1~deb11u1)
- IP address: 192.0.2.2
Host C (monitored through host B):
- Installed: bind9
- IP address: 192.0.2.3
- Configured to answer authoritatively for example.com on port 53.
I run the following on Host B:
$ /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3 -A -w 0.1 -c 1.0
DNS OK: 0.070 seconds response time. example.com returns
192.0.2.10,2001:db8::10|time=0.069825s;0.100000;1.000000;0.000000
Then I run the following on Host B. check_dns (part of
monitoring-plugins-standard) invokes nslookup. The response looks good.
When I run nslookup explicitly, it also looks good:
$ /usr/bin/nslookup -sil example.com 192.0.2.3
Server: 192.0.2.3
Address: 192.0.2.3#53
Name: example.com
Address: 192.0.2.10
Name: example.com
Address: 2001:db8::10
Now I set things up for monitoring using nrpe-ng with the following
configuration:
nrpe
over HTTPs DNS
Host A ------------------> Host B -------------> Host C
On Host B, I run the following:
sudo /usr/bin/python3 /usr/sbin/nrpe-ng --debug -f --config
/etc/nagios/nrpe-ng.cfg
While that is running, I run the following on Host A:
/usr/lib/nagios/plugins/check_nrpe_ng -H 192.0.2.2 -c check_dns -a example.com
192.0.2.3 0.1 1.0
I can see the DNS request and response on the wire (i.e., using tcpdump).
The result of running the command on Host A is:
DNS CRITICAL - '/usr/bin/nslookup -sil' msg parsing exited with no address
On Host B, I see the following debug output:
200 POST /v1/check/check_dns (192.0.2.1) 78.05ms
Executing: /usr/lib/nagios/plugins/check_dns -H example.com -s 192.0.2.3 -A -w
0.1 -c 1.0
(The output matches what I manually ran to test earlier.)
After rerunning nrpe-ng with the following:
sudo strace --read=4 -F /usr/bin/python3 /usr/sbin/nrpe-ng --debug -f --config
/etc/nagios/nrpe-ng.cfg
I see the following in the debug output on Host B:
[pid 1390861] read(4, "nslookup: ./src/unix/core.c:570:"..., 4096) = 83
| 00000 6e 73 6c 6f 6f 6b 75 70 3a 20 2e 2f 73 72 63 2f nslookup: ./src/ |
| 00010 75 6e 69 78 2f 63 6f 72 65 2e 63 3a 35 37 30 3a unix/core.c:570: |
| 00020 20 75 76 5f 5f 63 6c 6f 73 65 3a 20 41 73 73 65 uv__close: Asse |
| 00030 72 74 69 6f 6e 20 60 66 64 20 3e 20 53 54 44 45 rtion `fd > STDE |
| 00040 52 52 5f 46 49 4c 45 4e 4f 27 20 66 61 69 6c 65 RR_FILENO' faile |
| 00050 64 2e 0a d.. |
So it appears that the nslookup process is reporting an error, specifically
from this line of code:
https://github.com/libuv/libuv/blob/fb76f210eb6f093bc06a2f07646e56851818ccf2/src/unix/core.c#L602
However, I cannot reproduce it outside of nrpe-ng/check_dns/nslookup. I need
the help of someone more knowledgeable. Thoughts? Suggestions?
Thanks,
Casey
--
Visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from
this list
ISC funds the development of this software with paid support subscriptions.
Contact us at https://www.isc.org/contact/ for more information.
bind-users mailing list
bind-users@lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users