On Thu, Dec 14, 2017 at 07:00:58PM +0100, bert hubert wrote:
> On Thu, Dec 14, 2017 at 11:09:13PM +0530, Mukund Sivaraman wrote:
> > Any appetite for it? Don't throw things at me.. I ask because the
> > current thing is slowly getting more widely deployed and there are
> > design issues that can do with a ECS2 that breaks from ECS1 protocol. I
> > ask because I'm once again having to deal with myriad implementation
> > cases and dislike it.
> 
> Could you elaborate what you dislike most?

It is too complicated to implement ECS correctly. There are a large
number of corner cases. The things that resolvers and authoritative
sides have to take care of are quite different. It is more complex
than anything else in DNS.

I think this should be built again from scratch.

> The biggest thing we are noticing is that while it does great things
> to getting to a server the content provider likes, it unavoidably
> drives doen cache hitrates a lot, introducing a latency penalty.
> 
> The operators we see deploying ECS have tens of thousands of subnets
> which all need to be mapped to only a few servers. But you still end up
> with tens of thousands of cache entries and therefore tiny cache hitrates.
> 
> Such things could be addressed by answering with lists of subnet masks to
> which this answer would also apply, but this makes little sense
> operationally I think. 

Firstly, correct deaggregation is an important requirement of reducing
cache usage. With the current design of ECS protocol, it's very
important that correct disjoining of prefixes be done optimally to avoid
cache pollution, yet the draft does not specify a suitable algorithm for
it (we know how to do it, I think the draft should have stated it).

A /n address prefix as specified by ECS option is a perfect binary
tree of 1<<n addresses. To correctly deaggregate 0.0.0.0/0 (scope=0)
data from a longer prefix such as 10.0.0.0/24, this will result in all
these answers to be generated:

* 10.0.0.0/24 answer
* 10.0.0.0/23 exact match answer (scope > source)
* 10.0.1.0/23 answer
* 10.0.0.0/22 exact match answer (scope > source)
* 10.0.3.0/22 answer

and so on.. there are about 2n+1 answers necessary so that a 0.0.0.0/0
answer does not override a /n client from receiving its specific answer.

                  x
           x            x
         x   x        x   x
        x x x x      x x y y

If ECS option had more fields, we could have put the above pattern as
a difference of trees (with direction bit and height, e.g., "x"s in
the diagram above) and it would have reduced cache usage
considerably.

This can be generalized with more differences but anyway I think that
using QUERY for ECS is a badly done idea (not even mentioning privacy
loss).

As an example, RPZ does not rely on queries.. it transfers all prefixes
for matching in a zone so that the longest prefix match algorithm will
not suffer from a previously cached shorter prefix matching and
preventing future fetches.

Another related problem is this: We often want to match against a GeoIP
database (containing what may be changing but maintained prefixes) in
associating zone data with geographic/network-topo clients. We want to
say "serve this answer for country X or city Y or ASNNNN" and we don't
care about managing the actual prefixes.

GeoIP is a custom database format, where I can match against GeoIP, but
I can't easily deaggregate all its prefixes for an ECS zone to be cached
properly.

I feel it would have been better to say to a downstream resolver "This
answer is for country X", or "A is the answer for country X, B is the
answer for country Y, the rest use answer C".

The design of ECS needs to be reconsidered. I'd prefer something like a
zone format for it, than using QUERY. QUERY cannot give complete
information about all prefixes and there is a possibility of incorrect
caching, and a very high probability of redundant cache pollution.

>From a resolver's point of view, a non-ECS answer (no client-subnet
option) is different from an ECS answer with scope=0 which is different
from an ECS answer with source=0, whereas all these may be the same from
an authority's point of view. They all need to be cached differently
(from an intermediate resolver's view).

This thing of scope > source meaning for-exact-match-only is weird as
hell when implementing longest prefix matching. It is not convenient
to use an off-the-shelf radix tree.

ECS relies on the option always being returned for any kind of
answers, as some resolvers use that as an indicator of ECS support
(and stop using ECS if it ever stops). But ECS does not apply to
several kinds of answers (e.g., anything but NOERROR, esp. NXDOMAIN
and NODATA have to be consistent across all prefixes.) It doesn't
apply to SOA, DNSKEY, NS in answer section, referrals, etc. Yet,
many of these need to answer with SCOPE=0.

An ACL config option about whether the NS supports ECS or not (to
return the option or not) is different from a config option whether
the NS passes through ECS or not: the latter would always pass through
SOURCE=0 but return REFUSED for any ECS queries that didn't match the
ACL; where as the former would return non-ECS reply for any ECS
queries that didn't match the ACL).

Transitivity of the option has corner cases.

I don't have to point out how easy it is for a erroneous /16 to
prevent queries to /24 answers shadowed by the /16.

Some cache cases: Obviously an ECS cache is different from a
zone.. it's not from a single zone, it is not an atomic collection of
a single version of zone and ever changing. If there's a /24 answer in
cache, and a newer query brings in a /16 answer that shadows it,
should the resolver assume that the /16 has precedence because it's
newer (hence the /24 should no longer exist) or do a
longest-prefix-match against the older /24? What if the /16 then
expires and the /24 hasn't expired? An NXDOMAIN answer should expire
any previously cached prefix-specific cache entries for that name. A
NODATA answer should expire any previously cached prefix-specific
cache entries for that type.  Non-ECS data is different from SCOPE=0
data. There are questions about trust ranking with usage of ECS data.

These are just some topics that I can quickly think of. There are many
other issues we faced and discussed during resolver ECS development.

The draft leaves many things unspecified, such as more clarity in DNSSEC
and handling of negative answers. Many issues were fixed during the
draft phase, but I feel it was insufficient.

> Can you share your ideas for ECS2?

There are many quirks in ECS. I don't want to propose specific ideas
now, except that we should gather requirements and start from
scratch. We have to reduce complexity of the protocol on both auth and
caching resolver sides. I think it should be designed again from
requirements without being a tweak of ECS1. The current protocol
complicates DNS implementation significantly.

                Mukund

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to