I don't personally think cacheing NXDOMAIN is bad per se: the question is with what negative cache time, and what consequence in the context of a change in zone delegation structure in order to achieve DDoS mitigation. When there is no DDoS you want the cache to do its job. When there is, you want to be able to control the cache behaviour. Hard, in an unconstrained hinted distributed system.
Since the TTL is set in the parent of the delegation, the defence is both requiring of consideration by the parent, of the TTL to assert over the zone, *and* ownership of a time machine to adjust this down, when under attack. I'm delighted to seek funds to do the time machine part. Somebody else is going to have to get funded to do the cache coherence and flushing protocol in- or out- of band. How about if under load, a cache is permitted to convert NXDOMAIN ttl to 1/nth of the apparent ttl, based on some understood algorithm which relates to a load threshold? ie, under load, the cache deliberately lowers retention on negative cache state, so it can adapt to changes in that negativity? -G On Wed, Mar 16, 2016 at 10:23 AM, Ted Lemon <ted.le...@nominum.com> wrote: >> this is getting pretty good. anyone who stopped reading before now, may >> want to delve back in at this point. > > I on the other hand am a little frustrated because a while back I thought we > agreed, and now it appears that we don't. > >> an authority server operator experiencing a PRSD DDoS might wish to add >> a zone cut or even remove a name in order to manage their defense costs. >> when they hand out authoritative content to recursive servers, they will >> find that some recursive servers will honor the clarified subdomain >> semantics of nxdomain and others will not. this is not an >> interoperability problem, but it's a very real problem well deserving >> our attention here. > > The obvious place to do a PRSD attack is on a subdomain that contains valid > data, so that you can't NXDOMAIN the parent domain. So adding a zone cut > appears unlikely to help. Do you know of a situation where it would? > >> you're distracting me. i used to be a programmer and this is an >> interesting problem. as others have pointed out up-thread, you don't >> have to do the purge at nxdomain time; you can purge lazily when you are >> responding on an affected qname; or you can purge never and just send >> the nxdomain instead of the unreachable content, and let the unreachable >> content expire naturally (TTL or LRU or whatever.) or you can refactor >> your cache to be a hierarchy of hash tables rather than a flat hash table. > > You can't really purge lazily. TTL or LRU are your best bets. Purging > lazily requires you to take the performance hit John and I were arguing > about. Sure, it can be done, but ouch. > >> however, back to the topic at hand, the implementation costs are not >> germane to system correctness. if nominum CNS and/or nlnetlabs Unbound >> can't implement the clarified nxdomain semantics, then they just won't. >> your customers and your investors can decide what this means to them. > > This is the core of our disagreement, I think. You appear now to be saying > that system correctness demands that a cache purge subdomains of an NXDOMAIN. > That simply isn't true. An answer whose TTL hasn't expired is a valid > answer. A caching server that responds to a query for a cached name with > the data that was cached is operating correctly. The product name is Vantio > Cacheserve 7, by the way, and to quote a well-loved customer, it is faster > than greased bat****. > >> note that interposing a new zone cut should ideally also cause a purge, >> either real or effective. this is nowhere written down, but has the same >> reasoning: a new authority ought to be given a fresh chance to populate >> caches. this is related to, and similar to, but not the same as the >> ns-rrset reverification logic proposed in resimprove-00, and will cause >> very similar implementation problems for flat-hash caches. > > I agree that this will produce a more consistent view of the DNS, but these > consistency changes will occur by happenstance, not reliably, so they don't > really make the view through a cache significantly more correct. And there > are better ways to make bouillabaisse than to boil the ocean--this would be a > very expensive way to get a not very big improvement in consistency. > >> inconsistency is in that sense a known hazard, but not a benefit. we >> call the system "best efforts" because we know it won't be consistent >> but we want everyone involved to do their best anyway. > > It's true that inconsistency is not a benefit, but it's standard operating > procedure, and we have to deal with it every day. One reason we have to > deal with it is that, quite annoyingly, DNS caches cache NXDOMAINs. Ping a > nonexistent host, add its name to the authoritative server, and ping it > again, and you will get a "host unknown" error both times, because the > NXDOMAIN from the first ping attempt was cached. This is bad behavior that > we have codified into a standard because it improves performance. > > So if you want to tell me that, for the sake of correctness, I have to go > purge entries out of my hashed cache because I got an NXDOMAIN, then I will > tell you for the sake of correctness that we should never cache NXDOMAINs, > and we will both glare at each other sternly until one of us cracks a smile. > This is a silly argument. If correctness were our first and only concern, > DDoS attacks would be even easier than they are. > _______________________________________________ > DNSOP mailing list > DNSOP@ietf.org > https://www.ietf.org/mailman/listinfo/dnsop _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop