Re: [DNSOP] draft-wkumari-dnsop-dist-root-01.txt

Joe Abley Sat, 05 Jul 2014 09:11:32 -0700

Hi Paul, Warren,

On 4 July 2014 at 16:50:08, Paul Hoffman (paul.hoff...@vpnc.org) wrote:


> Greetings. Warren and I have done a major revision on this draft, narrowing 
> the design  
> goals, and presenting more concrete proposals for how the mechanism would 
> work. We welcome  
> more feedback, and hope to discuss it in the WG in Toronto.

I think there is much in the language of this draft that could be tightened up, 
but this is an idea for discussion so I'll avoid a pedantic line-by-line 
dissection. But I can give you the full pedantry if you like :-)

On the pros and cons, however (crudely pasted below), see below.

TL;DR: there are way more cons than pros to this proposal. The pros listed are 
weak; the cons listed are serious. I don't see a net advantage to the DNS (or 
to perceived performance of the DNS for any client) here. This proposal, if 
implemented, would represent non-trivial additional complexity with minimal or 
no benefit. I am not in favour of it, if that's not obvious.

As noted previously, I am not against documenting and discussing the merits of 
slaving the root zone on resolvers (in some fashion). My preference would be 
for a draft called something like 
"draft-ietf-dnsop-slaving-root-on-resolvers-harmful", which could borrow much 
of your section 5.1 and 5.2 to make its argument.

I remain very much *not* in favour of making changes to the DNS specification 
that don't have a clear benefit to balance their costs.

---

5.1.  Pros

   o  Junk queries / negative caching - Currently, a significant number
      of queries to the root servers are "junk" queries.  Many of these
      queries are TLDs that do not (and may never) exist in the root
      Another significant source of junk is queries where the negative
      TLD answer did not get cached because the queries are for second-
      level domains (a negative cache entry for "foo.example" will not
      cover a subsequent query for "bar.example").

I think a better way to accommodate the second point is to implement qname 
minimisation in recursive server logic.

I don't know that the first point is much of a pro. Root server operators need 
to provision significant spare capacity in order to accommodate flash crowds 
and attack traffic, and compared to that spare capacity the volume of junk 
queries is extremely small. There's no obvious operational benefit to root 
server operators in reducing their steady-state query load (in fact, it would 
make it harder in some cases to obtain the exchange point capacity you need to 
accommodate flash crowds, on exchanges where higher-capacity ports are reserved 
for those that have demonstrable need based on steady-state traffic.)

I'm also a little concerned about the word "junk". It's a pejorative term that 
implies assumptions about the intent of the original query. If my intent is to 
confirm that a top-level label doesn't exist, then "BLAH/IN/SOA" is a perfectly 
reasonable query for me to send to a root server. We might assume that a query 
"Joe's iPhone/IN/SOA" sent to a root server is not reasonable, but we're only 
assuming; we don't actually have a way of gauging the actual intent with any 
accuracy.

   o  DoS against the root service - By distributing the contents of the
      root to many recursive resolvers, the DoS protection for customers
      of the root servers is significantly increased.  A DDoS may still
      be able to take down some recursive servers, but there is much
      more root service infrastructure to attack in order to be
      effective.  Of course, there is still a zone distribution system
      that could be attacked (but it would need to be kept down for a
      much longer time to cause significant damage, and so far the root
      has stood up just fine to DDoS.

If I was to paraphrase this advantage with malicious intent :-), you mean that 
"we don't have to rely upon the root server system to continue to perform under 
attack, because we don't need the root server system any more, although we do 
need the new bits of the root server system we are specifying, and if those 
bits are not available we do need the conventional root server system after 
all, but that's probably ok because the root server system is pretty 
resilient". That sounds a bit circular.

   o  Small increase to privacy of requests - This also removes a place
      where attackers could collect information.  Although query name
      minimization also achieves some of this, it does still leak the
      TLDs that people behind a resolver are querying for, which may in
      itself be a concern (for example someone in a homophobic country
      who is querying for a name in .gay).

There's an implication here that a recursive resolver sending a query to a root 
server is potentially impinging upon the privacy of its anonymous clients. I 
find that a bit difficult to swallow.

I'm surprised not to see "improves performance for clients" in this list, on 
the general principle that every cache miss that triggers a query to a root 
server will take longer than consulting a pre-fetched root zone. I'm glad about 
that, though, since I think that performance improvement is (a) minuscule in 
normal operation, affecting 1/BIGNUM clients who expose a cache miss and (b) 
also achievable in the steady state by resolvers that perform cache 
pre-fetching (e.g. hammer-like behaviour).

My overall summary for 5.1 is that there's no clear benefit in performance, 
reliability or stability from making this change.

5.2.  Cons

   o  Loss of agility in making root zone changes - Currently, if there
      is an error in the root zone (or someone needs to make an
      emergency change), a new root zone can be created, and the root
      server operators can be notified and start serving the new zone
      quickly.  Of course, this does not invalidate the bad information
      in (long TTL) cached answers.  Notifying every recursive resolver
      is not feasible.  Currently, an "oops" in the root zone will be
      cached for the TTL of the record by some percentage of servers.
      Using the technique described above, the information may be cached
      (by the same percentage of servers) for the refresh time + the TTL
      of the record

A new root zone is published usually two (but sometimes more) times per day. 
The semantics specified in the draft for refreshing a local copy of the root 
zone say "keep re-using the copy you have until it expires". If I assume that 
"expire" means "survives beyond SOA.EXPIRE seconds of when we originally 
fetched it", then there's the potential for stale data to be published for a 
week plus however old the originally-retrieved file was (which is difficult to 
determine, in contrast to the traditional root zone distribution scheme). I 
think this disadvantage is more serious than is presented.

   o  No central monitoring point - DNS operators lose the ability to
      monitor the root system.  While there is work underway to
      implement better instrumentation of the root server system, this
      (potentially) removes the thing to monitor.

In fact there's exactly the same ability to monitor the root server system; 
it's just that the data available through such monitoring will be different (as 
you point out in the second sentence). OK, this one is a bit pedantic.

   o  Increased complexity in nameserver software and their operations -
      Any proposal for recursive servers to copy and serve the root
      inherently means more code to write and execute.  Note that many
      recursive resolvers are on inexpensive home routers that are
      rarely (if ever) updated.

You don't require universal deployment for this scheme to work, so the long 
tail of DNS software upgrades is arguably not  great concern. I think the 
increased complexity in operations is significant, though. My observation is 
that people already have enough difficulty troubleshooting DNS problems, and 
adding in a brand new set of services for root zone distribution that also need 
to be considered (potentially a set of services that are not subject to the 
same internal and external scrutiny of root server operations, and which are 
potentially operated by people that are even less familiar and easy to reach 
than root server operators are) is only going to make things worse.

If there was a significant benefit to performance or reliability to balance 
that increased operational complexity, I might see a reason to do it. I don't 
see that benefit, though.

   o  Changes the nature and distribution of traffic hitting the root
      servers - If all the "good" recursive resolvers deploy root
      copying, then root servers end up servicing only "bad" recursive
      resolvers and attack traffic.  The roots (could) become what AS112
      is for RFC1918.

The difference is that queries directed at AS112 servers are definitively junk. 
They are requests for names that cannot exist with global uniqueness, since 
they correspond to infratructure that is not globally unique. By contrast, the 
root servers will always receive queries that are important to answer (from an 
end-user's perspective), even if the proportion of such queries declines 
following the unexpected and optimistic changes in resolver behaviour implied 
in that paragraph.

Architecturally, I don't think you improve the quality of a service by reducing 
the impact of failure and giving the operators busy-work to fill their time.


Joe


_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] draft-wkumari-dnsop-dist-root-01.txt

Reply via email to