Automation isn't an solution in an of itself. When I recently mentioned, during a panel discussion, that automation is essential (for scalability), an operator on the same panel responded that automation is also a great way to scale problems. Automation is needed but it must be automated correctly and rely on good heuristics when it can't be deterministic. Automation contributes to resiliency insofar as addressing "fat-fingering" and "forgot to do it", but it won't address systemic issues. Automation won't fix weaknesses, but appropriately done it enables scalability and contributes to stability. (Much the same as "rebooting" never fixes problems, but it does make then go away - for a while.)
Nevertheless, the protocol definition has to expect and react appropriately to benign operations errors. What this means is that the protocol definition needs to include features that a receiver can use when expectations are not met to determine how to react. In the first DNSSEC validator (circa 1998), there were 50-100 different error codes, some indicated transient problems, some persistent, some suspicious, some superficial, with the problem becoming that only SERVFAIL was available to signal an error, a well-known knock on the DNSSEC design. In a perfect world, the protocol definition would not give rise to mistakes, a design ought to be graded on how far it goes towards that goal, but there'll never been a perfect world. As far as deployment, I think measurements of that ought to be integral in judging how well a protocol is designed. I attended part of the "Evolvability, Deployability, & Maintainability" (edm) WG session at IETF 118 and joined the mailing list to make that point but have heard no reaction. The discussion was focused only on seeing multiple implementations, falling short of examining whether anyone made us of the code (paths). Deployment to me is how the field of operations grades a protocol definition. On 2/1/24, 07:49, "DNSOP on behalf of Peter Thomassen" <dnsop-boun...@ietf.org on behalf of pe...@desec.io> wrote: On 2/1/24 13:34, Edward Lewis wrote: > The proper response will depend on the reason - more accurately the presumed (lacking any out-of-band signals) reason - why the record is absent. Barring any other information, the proper response should IMHO not depend on the presumed reason, but assume the worst case. Anything else would break expected security guarantees. > From observations of the deployment of DNSSEC, [...] > It’s very important that a secured protocol be able to thwart or limit damage due to malicious behavior, but it also needs to tolerate benign operational mistakes. If mistakes are frequent and addressed by dropping the guard, then the security system is a wasted in investment. That latter sentence seems right to me, but it doesn't follow that the protocol needs to tolerate "benign operational mistakes". Another approach would be to accompany protocol deployment with a suitable set of automation tools, so that the chance of operational mistakes goes down. That would be my main take-away from DNSSEC observations. In other words, perhaps we should consider a protocol incomplete if the spec doesn't easily accommodate automation and deployment without it would yield significant operational risk. Let's try to include automation aspects from the beginning. Peter -- https://urldefense.com/v3/__https://desec.io/__;!!PtGJab4!59Bd5xr0sMeJ5zWRh-uPWUQ_wVp05KY0rjweR55k1uxSyApBVPnOv28bYt2OwrkEgN-EyLTU3zHpyHG-bb4tB5c$ [desec[.]io] _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/dnsop__;!!PtGJab4!59Bd5xr0sMeJ5zWRh-uPWUQ_wVp05KY0rjweR55k1uxSyApBVPnOv28bYt2OwrkEgN-EyLTU3zHpyHG-XdGs2c4$ [ietf[.]org] _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop