To make a long story short, we've been having mysterious probe failures
with one of our Blackbox DNS probes against (only) some DNS servers that
turned out to be because Blackbox UDP DNS probes have a 512-byte limit
on the size of the reply, because Blackbox doesn't currently set EDNS
options to increase the allowed reply size and doesn't fall back to a
TCP query if the UDP query fails because of truncation. We think this
was partially due to these DNS servers using DNS cookies, which
increases the reply size.

(Our DNS probe checks not just for a successful reply but that the query
resolved to at least one A record, so some of the time the reply could
be long enough that the truncated version didn't include any of the A
records.)

Right now the only way to know for sure that your DNS query failed
because of truncation is to examine Blackbox probe logs, usually through
its web interface (but you can manually query with '..&debug=true'), and
notice that one of the log messages reports something like 'flags: qr tc
rd ra;' (the 'tc' is the important bit). If you are sure you know how
many resource records should in the various sections of the DNS replies,
you can check if the probe got the right number of RRs using the
probe_dns_*_rrs metrics.

For DNS servers that accept TCP connections, you can work around this by
switching your Blackbox DNS module to using TCP instead of the (default)
UDP.

(I suspect that most people will never run into this, but for our sins
we check some external DNS names that have long CNAME chains and other
fun things.)

        - cks

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/1690853.1719353967%40apps0.cs.toronto.edu.

Reply via email to