SVCB's syntax would need us to not only exclude non-ASCII characters but also avoid random delimiters like commas, right? I think that's going a bit too far. As Ryan notes, complex definitions for allowed strings result in ambiguities around who is responsible for validating what and subtle variations in different implementations. That ambiguity can lead to injection attacks when one component of a system expects some validation, but the other component disagrees.
I think a system that consistently expects a simple data type is more robust than carefully maneuvering around commas, spaces, newlines, etc. Text protocol syntaxes aren't universal syntax, and for every delimiter-based protocol we dodge, there'll be one we hit. For instance, ALPN identifiers already cannot be used as filenames because "http/1.1" includes a slash. More generally, the ship has sailed. Applications already need to tolerate arbitrary 8-bit ALPN strings out of their TLS libraries. That said, documenting some interop risks when allocating values is reasonable. IIRC, a lot of Java TLS stacks have issues with non-UTF8 (perhaps even non-ASCII?) identifiers. The getApplicationProtocol() API reports the ALPN protocol as a String (16-bit) and implementations sometimes use a random character set without paying attention. Note this isn't a fundamental limitation of 16-bit strings. It's possible to convey 8-bit-clean in a 16-bit string, if you define a suitable, if inelegant, encoding. https://w3c.github.io/resource-timing/#dom-performanceresourcetiming-nexthopprotocol https://infra.spec.whatwg.org/#isomorphic-decode On Thu, May 20, 2021 at 10:40 AM Ben Schwartz <bemasc= 40google....@dmarc.ietf.org> wrote: > On Thu, May 20, 2021 at 6:30 AM Salz, Rich <rsalz= > 40akamai....@dmarc.ietf.org> wrote: > >> Look at RFC 701, it says: the precise set of octet values that identifies >> the protocol. This could be the UTF-8 encoding of the protocol name. >> >> So I changed my mind and think it's okay to leave as-is but wouldn't mind >> if it became less general or more specific. For example, what if a protocol >> string matches a truncated UTF8 string? It makes me think of SNI which >> could have any format, but really is "any format as long as it's a DNS name" >> > > One intermediate option might be to keep the ALPN TLS extension 8-bit > clean, but change the IANA instructions for the ALPN registry to tighten > the registration requirements. > _______________________________________________ > TLS mailing list > TLS@ietf.org > https://www.ietf.org/mailman/listinfo/tls >
_______________________________________________ TLS mailing list TLS@ietf.org https://www.ietf.org/mailman/listinfo/tls