I filed two issues for Blackbox on Github, one for exposing at least the 'tc' flag state as a metric and one for allowing you to have Blackbox set an EDNS increased size (which is supported by the underlying Go DNS library Blackbox uses). I didn't file an issue for UDP to TCP fallback because I suspect that this is out of scope for Blackbox and anyway it raises design questions of, for example, how the metrics should work (since on a fallback Blackbox is now making two DNS requests).
For any interested parties, these are: https://github.com/prometheus/blackbox_exporter/issues/1258 https://github.com/prometheus/blackbox_exporter/issues/1259 - cks > Thanks for the detailed post. Sounds like a feature request/bug report. I > would file an issue on GitHub, this should be easily solved. > > https://github.com/prometheus/blackbox_exporter/issues > > On Wed, Jun 26, 2024 at 12:19 AM Chris Siebenmann < > cks.prom-users...@cs.toronto.edu> wrote: > > > To make a long story short, we've been having mysterious probe failures > > with one of our Blackbox DNS probes against (only) some DNS servers that > > turned out to be because Blackbox UDP DNS probes have a 512-byte limit > > on the size of the reply, because Blackbox doesn't currently set EDNS > > options to increase the allowed reply size and doesn't fall back to a > > TCP query if the UDP query fails because of truncation. We think this > > was partially due to these DNS servers using DNS cookies, which > > increases the reply size. > > > > (Our DNS probe checks not just for a successful reply but that the query > > resolved to at least one A record, so some of the time the reply could > > be long enough that the truncated version didn't include any of the A > > records.) > > > > Right now the only way to know for sure that your DNS query failed > > because of truncation is to examine Blackbox probe logs, usually through > > its web interface (but you can manually query with '..&debug=true'), and > > notice that one of the log messages reports something like 'flags: qr tc > > rd ra;' (the 'tc' is the important bit). If you are sure you know how > > many resource records should in the various sections of the DNS replies, > > you can check if the probe got the right number of RRs using the > > probe_dns_*_rrs metrics. > > > > For DNS servers that accept TCP connections, you can work around this by > > switching your Blackbox DNS module to using TCP instead of the (default) > > UDP. > > > > (I suspect that most people will never run into this, but for our sins > > we check some external DNS names that have long CNAME chains and other > > fun things.) > > > > - cks -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2625310.1719407827%40apps0.cs.toronto.edu.