Hi Paul, Warren, On 4 July 2014 at 16:50:08, Paul Hoffman (paul.hoff...@vpnc.org) wrote:
> Greetings. Warren and I have done a major revision on this draft, narrowing > the design > goals, and presenting more concrete proposals for how the mechanism would > work. We welcome > more feedback, and hope to discuss it in the WG in Toronto. I think there is much in the language of this draft that could be tightened up, but this is an idea for discussion so I'll avoid a pedantic line-by-line dissection. But I can give you the full pedantry if you like :-) On the pros and cons, however (crudely pasted below), see below. TL;DR: there are way more cons than pros to this proposal. The pros listed are weak; the cons listed are serious. I don't see a net advantage to the DNS (or to perceived performance of the DNS for any client) here. This proposal, if implemented, would represent non-trivial additional complexity with minimal or no benefit. I am not in favour of it, if that's not obvious. As noted previously, I am not against documenting and discussing the merits of slaving the root zone on resolvers (in some fashion). My preference would be for a draft called something like "draft-ietf-dnsop-slaving-root-on-resolvers-harmful", which could borrow much of your section 5.1 and 5.2 to make its argument. I remain very much *not* in favour of making changes to the DNS specification that don't have a clear benefit to balance their costs. --- 5.1. Pros o Junk queries / negative caching - Currently, a significant number of queries to the root servers are "junk" queries. Many of these queries are TLDs that do not (and may never) exist in the root Another significant source of junk is queries where the negative TLD answer did not get cached because the queries are for second- level domains (a negative cache entry for "foo.example" will not cover a subsequent query for "bar.example"). I think a better way to accommodate the second point is to implement qname minimisation in recursive server logic. I don't know that the first point is much of a pro. Root server operators need to provision significant spare capacity in order to accommodate flash crowds and attack traffic, and compared to that spare capacity the volume of junk queries is extremely small. There's no obvious operational benefit to root server operators in reducing their steady-state query load (in fact, it would make it harder in some cases to obtain the exchange point capacity you need to accommodate flash crowds, on exchanges where higher-capacity ports are reserved for those that have demonstrable need based on steady-state traffic.) I'm also a little concerned about the word "junk". It's a pejorative term that implies assumptions about the intent of the original query. If my intent is to confirm that a top-level label doesn't exist, then "BLAH/IN/SOA" is a perfectly reasonable query for me to send to a root server. We might assume that a query "Joe's iPhone/IN/SOA" sent to a root server is not reasonable, but we're only assuming; we don't actually have a way of gauging the actual intent with any accuracy. o DoS against the root service - By distributing the contents of the root to many recursive resolvers, the DoS protection for customers of the root servers is significantly increased. A DDoS may still be able to take down some recursive servers, but there is much more root service infrastructure to attack in order to be effective. Of course, there is still a zone distribution system that could be attacked (but it would need to be kept down for a much longer time to cause significant damage, and so far the root has stood up just fine to DDoS. If I was to paraphrase this advantage with malicious intent :-), you mean that "we don't have to rely upon the root server system to continue to perform under attack, because we don't need the root server system any more, although we do need the new bits of the root server system we are specifying, and if those bits are not available we do need the conventional root server system after all, but that's probably ok because the root server system is pretty resilient". That sounds a bit circular. o Small increase to privacy of requests - This also removes a place where attackers could collect information. Although query name minimization also achieves some of this, it does still leak the TLDs that people behind a resolver are querying for, which may in itself be a concern (for example someone in a homophobic country who is querying for a name in .gay). There's an implication here that a recursive resolver sending a query to a root server is potentially impinging upon the privacy of its anonymous clients. I find that a bit difficult to swallow. I'm surprised not to see "improves performance for clients" in this list, on the general principle that every cache miss that triggers a query to a root server will take longer than consulting a pre-fetched root zone. I'm glad about that, though, since I think that performance improvement is (a) minuscule in normal operation, affecting 1/BIGNUM clients who expose a cache miss and (b) also achievable in the steady state by resolvers that perform cache pre-fetching (e.g. hammer-like behaviour). My overall summary for 5.1 is that there's no clear benefit in performance, reliability or stability from making this change. 5.2. Cons o Loss of agility in making root zone changes - Currently, if there is an error in the root zone (or someone needs to make an emergency change), a new root zone can be created, and the root server operators can be notified and start serving the new zone quickly. Of course, this does not invalidate the bad information in (long TTL) cached answers. Notifying every recursive resolver is not feasible. Currently, an "oops" in the root zone will be cached for the TTL of the record by some percentage of servers. Using the technique described above, the information may be cached (by the same percentage of servers) for the refresh time + the TTL of the record A new root zone is published usually two (but sometimes more) times per day. The semantics specified in the draft for refreshing a local copy of the root zone say "keep re-using the copy you have until it expires". If I assume that "expire" means "survives beyond SOA.EXPIRE seconds of when we originally fetched it", then there's the potential for stale data to be published for a week plus however old the originally-retrieved file was (which is difficult to determine, in contrast to the traditional root zone distribution scheme). I think this disadvantage is more serious than is presented. o No central monitoring point - DNS operators lose the ability to monitor the root system. While there is work underway to implement better instrumentation of the root server system, this (potentially) removes the thing to monitor. In fact there's exactly the same ability to monitor the root server system; it's just that the data available through such monitoring will be different (as you point out in the second sentence). OK, this one is a bit pedantic. o Increased complexity in nameserver software and their operations - Any proposal for recursive servers to copy and serve the root inherently means more code to write and execute. Note that many recursive resolvers are on inexpensive home routers that are rarely (if ever) updated. You don't require universal deployment for this scheme to work, so the long tail of DNS software upgrades is arguably not great concern. I think the increased complexity in operations is significant, though. My observation is that people already have enough difficulty troubleshooting DNS problems, and adding in a brand new set of services for root zone distribution that also need to be considered (potentially a set of services that are not subject to the same internal and external scrutiny of root server operations, and which are potentially operated by people that are even less familiar and easy to reach than root server operators are) is only going to make things worse. If there was a significant benefit to performance or reliability to balance that increased operational complexity, I might see a reason to do it. I don't see that benefit, though. o Changes the nature and distribution of traffic hitting the root servers - If all the "good" recursive resolvers deploy root copying, then root servers end up servicing only "bad" recursive resolvers and attack traffic. The roots (could) become what AS112 is for RFC1918. The difference is that queries directed at AS112 servers are definitively junk. They are requests for names that cannot exist with global uniqueness, since they correspond to infratructure that is not globally unique. By contrast, the root servers will always receive queries that are important to answer (from an end-user's perspective), even if the proportion of such queries declines following the unexpected and optimistic changes in resolver behaviour implied in that paragraph. Architecturally, I don't think you improve the quality of a service by reducing the impact of failure and giving the operators busy-work to fill their time. Joe _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop