On Thu, Aug 25, 2022 at 1:35 PM Ben Schwartz <bem...@google.com> wrote:
> > Brian proposes a use case of serving only a warning message on the origin > endpoint, in order to minimize the load on IP addresses that are likely > hardcoded into a customer's zone. > So, the major update to add to this is: - We (GoDaddy) have revisited this approach, and are now considering a much better design (summary follows below) The design we are considering is deployment of Web redirect servers (via apex A/AAAA records) which do HTTP 301 permanent redirect responses. These would respond to connections to the apex domain ("example.com") and redirect the client to a non-apex name ("www.example.com"). The non-apex name would have a CNAME to redirect to the actual delegated authority. The RDATA on the CNAME would be identical to the RDATA on the apex HTTPS record. Note the following: - This will provide legacy clients the same eventual connectivity as the HTTPS record, including connecting to the correct (aka "best") target node at the CNAME/HTTPS target name, since both are resolved by the client's resolver - Legacy clients will have a one-time latency penalty for the HTTP 301 connection and redirect. This penalty is once per domain only, per client. - The apex A/AAAA, HTTPS, and www CNAME records are all cacheable, and likely to have long TTLs - The target name is identical, and client-resolver caching provides benefits to both legacy and HTTPS-aware clients. Note also, the following: - The target of both the HTTPS and CNAME records are the same - Resolution failures or connection failures will have a shared fate, between legacy and HTTPS-aware clients - An HTTPS-aware client, which attempts to do the fallback procedure, will experience the legacy-mode delay due to the HTTP 301 rewrite, but will still end up hitting the same issue that triggered the fallback - In other words, for this publication scheme, fallback will NEVER achieve its desired/expected goal - Individual instances of fallback working due to temporary issues, would have had the same success achieved by merely retrying the connection or resolution (tautologically!) If we do deploy this, we will do so on all our customer domains using HTTPS. This means that for those domains (in the millions or tens of millions), the fallback in the draft will only result in added overhead while never actually achieving any successful connections (due to shared fate between legacy and HTTPS). We know this will be the case for these domains. The logical approach would be to do one of the following things: - Allow the domain owner to signal that fallback will not work, e.g.: - An AliasMode SvcParam (e.g. example.com HTTPS 0 mycdn.example.net nofallback) - or a third HTTPS "mode" record, to signal no fallback (e.g. example.com HTTPS 65534 . where 65534 is the "no fallback" mode signal, and "." is simply a placeholder domain needed for RRDATA structural consistency) - Both of these would require significant changes to the draft, to clients, and to authority servers.. - Strongly not recommended - Allow the domain owner to only supply fallback addresses explicitly, and in the absence of those, not do the fallback (e.g. using an attrleaf prefix label) - This is the "presence/absence is the signal" (e.g. _ https_fallback.example.com A 192.0.2.1 // chosen from one of the RFC 5737 blocks ) - This is also extensible, since the attrleaf prefix would be (presumably) SVCB-record specific - HTTPS would have its own attrleaf prefix, and each new SVCB-compatible record would have its own attrleaf prefix - Would require changes to the draft, to clients, and to authority server zone publication automation (but not to the authority server software) - Not recommended - Remove fallback from the draft - Signaling is only needed if fallback is included in the draft - Much less work; only clients would require changes, and only to remove code/logic - Fail fast - Deterministic and reliable behavior - Interoperable across client implementations and server implementations - Still requires changes to the draft - Least of three "evils" Removing the language from the draft does not force implementers to not do their own thing. Individual client implementations could still do the fallback thing, but would not be required to do so. It does, however, put more responsibility on the implementers to respond to issues raised if adverse effects result. It might be advisable to be a user-configurable option, possibly off-by-default. Implementers would not be able to deflect blame for problems via the "it's what the RFC says" response, if problems do occur. > Instead, the draft attempts to ensure that deploying and implementing the > HTTPS record "does no harm", by giving participating clients no worse > reliability than legacy clients. > This is one place where quantitative data would help the conversation immensely. Is there data concerning the failures observed (DNS resolution or HTTP connections) in following CNAME records from authoritative zones to CDNs? If the failure rates are really low, is that worth the effort in adding this fallback flow? If the failure rates are highly variable (by topology, DNS resolver instance, client machine specs, network environment, etc.), is there any experimental data to support a statistically significant improvement using different approaches? Is the DNS component a major contributor, or not? If not, perhaps the benefits of ServiceMode actually become more important, and falling back is actually likely to degrade, rather than improve, the user experience? Is the implementation of fallback strictly speculative? If so, perhaps leaving it out of the draft, presenting results at DNS-OARC once data is available, and publishing a -bis draft to include fallback (if the data supports doing so) is a better approach? > For example, post-deployment data from browsers may show that we could > eliminate the final fallback without reducing reliability. > Among the problems introduced by HTTPS-aware clients successfully obtaining AliasMode records, and then subsequently connecting via apex A/AAAA records (when fallback occurs) is that DNS-level observations are adversely affected. This is true whether observing the authoritative servers for the zone, or the recursive resolvers that clients are querying. Looking only at the DNS traffic will yield data that is difficult to correlate and interpret. There will not be a clean "signal" identifying legacy-only clients. There will not be any ability to correlate fallback behavior with client software (browser "brand" generally, or brand+version). So, attempting to optimize for failure can actually negatively impact measurement of failure and root cause analysis. > Viktor notes with concern that AliasMode is a "non-deterministic > redirect". Instead, the draft attempts to model the client behavior as a > preference ordered stack of endpoints: > > 1. Basic: the origin endpoint (status quo ante) > 2. Better: the endpoint at the end of the AliasMode chain > 3. Best: the ServiceMode records > > I think it's best to think of AliasMode as an alias that is optional when > SVCB is optional, and mandatory when SVCB is mandatory. > > This seems natural enough to me, and allows it to be used in environments > like the web where "fail fast" is not an appealing option. > Fail fast may not be appealing, but in some (probably the majority of) cases, it may be the most correct option. It may also be the case that the zone owner knows whether this is the case. I think it is much more likely that explicitly declaring the situation (if known) is more useful than having several billion clients independently attempting to infer whether the first option will even work, let alone provide a useful alternative to the second or third. Brian
_______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop