John Levine wrote: > >What I'm asking is how the octet sequences provided by the URI RR RFC > >are decoded into the sequences of URI characters used by the URI RFC. > >Is there a generic way to do this, or does it depend on the specific > >protocol (e.g., HTTP), or is it left up to the application? > > As far as I can see, RFC 3986 defines URIs as sequences of ASCII > characters. In the few places where they mention non-ASCII material, > it says to represent them as percent encoded UTF-8, so it's still all > ASCII.
OK. That RFC seems to distance itself from mere octets. > Can you give an example of URI RDATA where it would make sense to > interpret it other than as ASCII? This is the FTP example from the URI RR RFC, to which the UTF-8 byte order mark has been gratuitously added: TYPE256 \# 36 000a0001efbbbf6674703a2f2f667470312e6578616d706c652e636f6d2f7075626c6963 or, equivalently, URI 10 1 "\239\187\191ftp://ftp1.example.com/public" Attempting to decode it as ASCII simply does the wrong thing, but I don't see any reason that it's not a valid URI RR, and, knowing that it's encoded as UTF-8 w/ BOM, it can be successfully parsed into a URI (provided the Target field is handed off to the URI-parsing application as raw bytes, and not as a string with DNS zone file \DDD style escapes). > I suppose to be perfectly clear we might either say "percent encode > everything" or we might say "unencoded UTF-8 is allowed." They're > both unambigious, and I expect most parsers can handle both. It would be very nice indeed if application developers did not have to guess at the encoding of the bytes. -- Robert Edmonds _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop