[DNSOP] Thoughts about draft-wkumari-dnsop-extended-error

Edward Lewis Wed, 26 Jul 2017 08:35:04 -0700

First there's a need to divide and conquer.  Or maybe conquer a different 
target.


Why express (return) an error condition notification?  To let the requestor 
know what happened?

I don't think that's sufficient or even the true goal.

Another reason for returning an error notice is to tell the requester what they 
ought to do next.

In DNS there are a few choices, here are some (I may miss "corner cases"):

1 - try again in 5 seconds
2 - try a different server authoritative for the zone
3 - the zone (admin) says "no"
4 - try a different recursive server
5 - don't try again (ever, for a while) for this name or anything "under" it
6 - change your query to something else (like HTTP? permanent redirect)
7 - probably other reactions

Note that the above list is meant to be independent of what went wrong.

The other reason is to inform the requestor.  Doing some navel gazing, why do 
this?  For the most part, the other end only needs to know how to react as 
above. We do have a great tradition of wanting to know why a service someone 
else runs misbehaves in our eyes.  This can be good, helping in the 
identification of problems, but as operations become more "professional-grade" 
this may be an outdated romantic notion.  (Although in a speed test of 
public-twitter-angst vs. nagios-alerts, twitter has proven to win at least some 
races.)

But if we are going to get into explaining why, here are some considerations.

One, different code bases will have myriads of errors to report that are 
related to their own threads.  (Recall old-time BIND INSISTs?)  Do you want to 
have that be reported?

Two there are protocol "errors" - so long as we don't add to the mythical DNS 
server protocol state machine, the set of errors can be enumerated.

I can see that knowing if a state is transient or permanent can indicate what a 
requestor ought to do, but then see the first part of the message.  The 
difference of transient and permanent may be just a perception of time scale, 
with permanent ending at a reboot.

E.g., what if a resolver gets a response and finds DNSSEC telling it to reject 
the data for X seconds.  For those X seconds, the resolver will not send a 
response, so it's "permanent" for X seconds in some sense, transient that in X 
seconds the negative DNSSEC cache will expire the lesson learned.

In short - instead of error conditions, define "exceptional reactions" a 
requestor ought to pursue.  This will probably be much more quantifiable (in 
that the mythical protocol state machine for the requestor is simpler than the 
server and less likely to be radically changed in the future).

And - you will get away from having to localize the error explanation into 
different human languages.  (I hadn't forgotten this, but this issue is a red 
herring.)  The DNS protocol need not be human friendly, it's meant for machines 
to to talk to machines.  Trying to make it talk to people at the level people 
understand might just be too large a task.

Mental exercise: what does a querier do when it see's NOTIMP now?  Switch 
servers and hope?  REFUSED vs. SERVFAIL for lame delegation, which is better?  
Protocol-wise, "what happened" isn't all that useful, "what to do next" would 
be.

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Thoughts about draft-wkumari-dnsop-extended-error

Reply via email to