I agree that SVCB can usually be thought of as descriptive, not prescriptive. The publisher provides information about their service, and the recipient makes use of it in some reasonable way. For the "testing" flag, the descriptive information is basically "this endpoint does not carry my SLA".
I don't think the existence of server support for a less-secure protocol is sufficient signal. We should plan to spend decades with some resolvers implementing downgrade-resistant DoT, some implementing DoT with fallback heuristics, and some resolvers not implementing DoT at all. During that period, auth servers won't be able to disable Do53, so we won't be able to use that as a signal about the reliability of the DoT service. You can see a variation on this problem in draft-ietf-tls-svcb-ech, which says that ECH-aware clients can distinguish between "fail open" and "fail closed" by whether ECH is offered on all records in the ServiceMode RRset. This works because ECH-aware clients never fall back from ECH to non-ECH within a single ServiceMode record, so "fail closed" is expressible by offering ECH on every record in the RRset. We currently don't have a no-fallback rule like that for encrypted transports in DELEG. We could certainly add one, but doing so would likely double the number of DELEG ServiceMode records for decades. That's inconvenient, especially if these records start to include nontrivial payloads. --Ben ________________________________ From: DNSOP <dnsop-boun...@ietf.org> on behalf of Edward Lewis <edward.le...@icann.org> Sent: Wednesday, February 14, 2024 7:23 AM To: Manu Bretelle <chan...@gmail.com> Cc: dnsop@ietf.org <dnsop@ietf.org> Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions From: Manu Bretelle <chantr4@ gmail. com> Date: Tuesday, February 13, 2024 at 19: 03 To: Edward Lewis <edward. lewis@ icann. org> Cc: "dnsop@ ietf. org" <dnsop@ ietf. org> Subject: Re: [DNSOP] [Ext] Re: General comment about ZjQcmQRYFpfptBannerStart This Message Is From an External Sender ZjQcmQRYFpfptBannerEnd From: Manu Bretelle <chan...@gmail.com> Date: Tuesday, February 13, 2024 at 19:03 To: Edward Lewis <edward.le...@icann.org> Cc: "dnsop@ietf.org" <dnsop@ietf.org> Subject: Re: [DNSOP] [Ext] Re: General comment about downgrades vs. setting expectations in protocol definitions First - why am I resisting this proposal? I believe that for the sake of operations, development of protocols must trend towards simplicity. I would add a flag or field when necessary and only then, lest it be forgotten (a burden with no benefit upon code maintainers) or worse a stumbling block (misused, mis-set, generally mis-understood). On Tue, Feb 13, 2024 at 7:35 AM Edward Lewis <edward.le...@icann.org<mailto:edward.le...@icann.org>> wrote: >An operator dipping its toes with DELEG and encrypted protocols may be willing >to signal to a resolver that such failures are likely operational failure >because this is a testing endpoint that may be unstable due to lack of >operational expertise. A privacy aware resolver can then decide to fallback on >clear-text. Again, there is nothing preventing the resolver to fail hard here, >this is out of the control of the auth server operator. All that can be done >is to "signal". Wouldn’t the availability of the fallback transport be enough signal that the service operator does not have full faith in the preferred transport? Having a separate flag is like a second source of data, there might be an inconsistency between the two, which is a generic form of root cause. >I could also imagine an operator going through their first cert rotation to be >erring on the side of safety and switching to "testing" mode temporarily. A bit of my concern is that sometimes we forget to remove the training wheels once we’ve learned. A common error in operations is to forget the cleanup phase (remove old files, etc.) once new functionality has been proven. This is a reason why I’m hesitant to support having a flag like this. >If you look back at DNSSEC, had it been possible to turn DNSSEC in >"permissive" mode, would more operators have taken the leap to enable it >knowing that resolvers that would validate records would have been willing to >fallback while the flag is on? I think from an operational point of view, this >is something that can be of great help to build operational confidence and >expertise without taking the risk to break one's DNS. Yes, yes it would. Early on there was criticism that DNSSEC was “ok” or “fail”. When operators messed up their key rotations (this happened quite often around 2010), there were calls to “purge caches” and even some thought given to automating a way for operators to initiate a global cache purge of their data. (Failed, of course - there’s no way.) This was followed by the development of negative trust anchors after the COMCAST/NASA.gov issue, something that was an uphill battle by operators to get documented in an IETF document. More recently, an operator asked me about a developing a new resource record type that could be published at a zone apex to signal that all validations records signed by the apex keyset ought to be ignored. (Sketched up, but not what the operator had in mind.) Operators list the great leap of risk as a reason not to implement DNSSEC. The protocol design did not accommodate a soft introduction. The levels of certainty are binary - thumbs up or thumbs down thanks to the reliance on the DNS response code as the only error channel. When I wrote a prototype validator during experimentation on DNSSEC, I realized that there were 50 or so if statements, anyone of which would cause validation to fail. Some of the if’s were likely transient, some persistent, and so on, this information would have informed the response. But we didn’t have enough bandwidth (that response code field was all) to feed that back up the chain. We probably then ought to have defined an extended response code mechanism - which is now a current work in progress in DNSOP, if I’m right. In summary - I think this flag would be redundant to the availability of a means to fallback. Basing the justification on “testing phase” assumes that it is a distinct phase with a declared ending - which I don’t believe is often true. And I think we do need to build in a way for risk of adoption (initial or otherwise) to be lower, one way is via better feedback, other ways via abilities to “test-in-prod” (“immediate trial period, when staff is able to watch it launch before leaving for lunch”) and so on.
_______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop