John Levine wrote:
> >What I'm asking is how the octet sequences provided by the URI RR RFC
> >are decoded into the sequences of URI characters used by the URI RFC.
> >Is there a generic way to do this, or does it depend on the specific
> >protocol (e.g., HTTP), or is it left up to the application?
> 
> As far as I can see, RFC 3986 defines URIs as sequences of ASCII
> characters.  In the few places where they mention non-ASCII material,
> it says to represent them as percent encoded UTF-8, so it's still all
> ASCII.

OK.  That RFC seems to distance itself from mere octets.

> Can you give an example of URI RDATA where it would make sense to
> interpret it other than as ASCII?

This is the FTP example from the URI RR RFC, to which the UTF-8 byte
order mark has been gratuitously added:

    TYPE256 \# 36 
000a0001efbbbf6674703a2f2f667470312e6578616d706c652e636f6d2f7075626c6963

or, equivalently,

    URI 10 1 "\239\187\191ftp://ftp1.example.com/public";

Attempting to decode it as ASCII simply does the wrong thing, but I
don't see any reason that it's not a valid URI RR, and, knowing that
it's encoded as UTF-8 w/ BOM, it can be successfully parsed into a URI
(provided the Target field is handed off to the URI-parsing application
as raw bytes, and not as a string with DNS zone file \DDD style
escapes).

> I suppose to be perfectly clear we might either say "percent encode
> everything" or we might say "unencoded UTF-8 is allowed."  They're
> both unambigious, and I expect most parsers can handle both.

It would be very nice indeed if application developers did not have to
guess at the encoding of the bytes.

-- 
Robert Edmonds

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to