On Wed, Apr 15, 2020 at 07:19:43PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > This is true for users running local nameservers, which ideally will
> > eventually be everyone, but at present that's far from the case.
> > Differences like concurrent attempts from multiple nameservers and/or
> > lack of TCP fallback on TC are what makes netstat fast on musl vs
> > repeatedly stalling for multiple seconds at a time on other
> > implementations. I don't have any data on how often TC happens and if
> > it's actually a big part of the difference, so this is probably worth
> > exploring. But I think it's a separate topic from the issue with DANE
> > on Postfix, so let's set it aside and pick that back up on the musl
> > list or elsewhere later.
> 
> qmail famously used a 512 byte buffer for the DNS response (the same
> amount that can fit into a UDP DNS response), and it wasn't enough for
> some MX responses at the time.  Pretty much everyone using qmail
> eventually had to patch around this.  (There were also problematic ANY
> queries, if I recall correctly.)

I'd be interested in reading more on this if you know any references.
Over 512 bytes of MX records seems like a lot, and seems like a really
bad idea for a domain configuration since there have always been (as
you noted, with qmail) compatibility problems with not all sites being
able to resolve them.

> DNS practices for mail have changed since then.  Maybe you can get
> away with a 512 byte response buffer these days if you don't use
> DNSSEC.

"If you don't use DNSSEC" is ambiguous. As long as DNSSEC is being
validated in the nameserver the stub contacts (which should be local
to have reasonable trust properties), "using" DNSSEC does not impose
any additional response size requirements on the application/stub
resolver.

> I don't understand your PTR example.  It seems such a fringe case that
> people produce larger PTR responses because they add all virtual hosts
> to the reverse DNS zone.  Sure, it happens, but not often.

I think it's probably more a matter of the concurrent lookups from
multiple nameservers (e.g. local, ISP, and G/CF, where local has
fastest round-trip but not much in cache, G/CF has nearly everything
in cache but slowest round trip, and ISP is middle on both) than lack
of tcp fallback that makes netstat etc. so much faster. However it's
not clear how "fallback to tcp" logic should interact with such
concurrent requests -- switch to tcp for everything and just one
nameserver as soon as we get any TC response? or something else? If we
do a tcp option it might make sense to make it just unconditional
(first query, not fallback), single-nameserver, intended for use with
a local one (like a DNSSEC-validating setup should be) and controlled
by resolv.conf.

Rich

Reply via email to