Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions

Ben Schwartz Wed, 14 Feb 2024 08:34:04 -0800

I agree that SVCB can usually be thought of as descriptive, not prescriptive.  
The publisher provides information about their service, and the recipient makes 
use of it in some reasonable way.  For the "testing" flag, the descriptive 
information is basically "this endpoint does not carry my SLA".

I don't think the existence of server support for a less-secure protocol is 
sufficient signal.  We should plan to spend decades with some resolvers 
implementing downgrade-resistant DoT, some implementing DoT with fallback 
heuristics, and some resolvers not implementing DoT at all.  During that 
period, auth servers won't be able to disable Do53, so we won't be able to use 
that as a signal about the reliability of the DoT service.

You can see a variation on this problem in draft-ietf-tls-svcb-ech, which says 
that ECH-aware clients can distinguish between "fail open" and "fail closed" by 
whether ECH is offered on all records in the ServiceMode RRset.  This works 
because ECH-aware clients never fall back from ECH to non-ECH within a single 
ServiceMode record, so "fail closed" is expressible by offering ECH on every 
record in the RRset.

We currently don't have a no-fallback rule like that for encrypted transports 
in DELEG.  We could certainly add one, but doing so would likely double the 
number of DELEG ServiceMode records for decades.  That's inconvenient, 
especially if these records start to include nontrivial payloads.

--Ben
________________________________
From: DNSOP <dnsop-boun...@ietf.org> on behalf of Edward Lewis 
<edward.le...@icann.org>
Sent: Wednesday, February 14, 2024 7:23 AM
To: Manu Bretelle <chan...@gmail.com>
Cc: dnsop@ietf.org <dnsop@ietf.org>
Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting 
expectations in protocol definitions

From: Manu Bretelle <chantr4@ gmail. com> Date: Tuesday, February 13, 2024 at 
19: 03 To: Edward Lewis <edward. lewis@ icann. org> Cc: "dnsop@ ietf. org" 
<dnsop@ ietf. org> Subject: Re: [DNSOP] [Ext] Re: General comment about
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender

ZjQcmQRYFpfptBannerEnd

From: Manu Bretelle <chan...@gmail.com>
Date: Tuesday, February 13, 2024 at 19:03
To: Edward Lewis <edward.le...@icann.org>
Cc: "dnsop@ietf.org" <dnsop@ietf.org>
Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting 
expectations in protocol definitions

First - why am I resisting this proposal?  I believe that for the sake of 
operations, development of protocols must trend towards simplicity.  I would 
add a flag or field when necessary and only then, lest it be forgotten (a 
burden with no benefit upon code maintainers) or worse a stumbling block 
(misused, mis-set, generally mis-understood).

On Tue, Feb 13, 2024 at 7:35 AM Edward Lewis 
<edward.le...@icann.org<mailto:edward.le...@icann.org>> wrote:

>An operator dipping its toes with DELEG and encrypted protocols may be willing 
>to signal to a resolver that such failures are likely operational failure 
>because this is a testing endpoint that may be unstable due to lack of 
>operational expertise. A privacy aware resolver can then decide to fallback on 
>clear-text. Again, there is nothing preventing the resolver to fail hard here, 
>this is out of the control of the auth server operator. All that can be done 
>is to "signal".

Wouldn’t the availability of the fallback transport be enough signal that the 
service operator does not have full faith in the preferred transport?  Having a 
separate flag is like a second source of data, there might be an inconsistency 
between the two, which is a generic form of root cause.

>I could also imagine an operator going through their first cert rotation to be 
>erring on the side of safety and switching to "testing" mode temporarily.

A bit of my concern is that sometimes we forget to remove the training wheels 
once we’ve learned.  A common error in operations is to forget the cleanup 
phase (remove old files, etc.) once new functionality has been proven.  This is 
a reason why I’m hesitant to support having a flag like this.

>If you look back at DNSSEC, had it been possible to turn DNSSEC in 
>"permissive" mode, would more operators have taken the leap to enable it 
>knowing that resolvers that would validate records would have been willing to 
>fallback while the flag is on? I think from an operational point of view, this 
>is something that can be of great help to build operational confidence and 
>expertise without taking the risk to break one's DNS.

Yes, yes it would.  Early on there was criticism that DNSSEC was “ok” or 
“fail”.  When operators messed up their key rotations (this happened quite 
often around 2010), there were calls to “purge caches” and even some thought 
given to automating a way for operators to initiate a global cache purge of 
their data.  (Failed, of course - there’s no way.)  This was followed by the 
development of negative trust anchors after the COMCAST/NASA.gov issue, 
something that was an uphill battle by operators to get documented in an IETF 
document.  More recently, an operator asked me about a developing a new 
resource record type that could be published at a zone apex to signal that all 
validations records signed by the apex keyset ought to be ignored.  (Sketched 
up, but not what the operator had in mind.)

Operators list the great leap of risk as a reason not to implement DNSSEC.  The 
protocol design did not accommodate a soft introduction.  The levels of 
certainty are binary - thumbs up or thumbs down thanks to the reliance on the 
DNS response code as the only error channel.

When I wrote a prototype validator during experimentation on DNSSEC, I realized 
that there were 50 or so if statements, anyone of which would cause validation 
to fail.  Some of the if’s were likely transient, some persistent, and so on, 
this information would have informed the response.  But we didn’t have enough 
bandwidth (that response code field was all) to feed that back up the chain.  
We probably then ought to have defined an extended response code mechanism - 
which is now a current work in progress in DNSOP, if I’m right.

In summary - I think this flag would be redundant to the availability of a 
means to fallback.  Basing the justification on “testing phase” assumes that it 
is a distinct phase with a declared ending - which I don’t believe is often 
true.  And I think we do need to build in a way for risk of adoption (initial 
or otherwise) to be lower, one way is via better feedback, other ways via 
abilities to “test-in-prod” (“immediate trial period, when staff is able to 
watch it launch before leaving for lunch”) and so on.

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions

Reply via email to