Re: Use cases for invalid-Unicode atoms

Anne van Kesteren Wed, 21 Mar 2018 02:41:23 -0700

On Wed, Mar 21, 2018 at 10:27 AM, Henri Sivonen <hsivo...@hsivonen.fi> wrote:
>  * A bunch of things atomicize URL components. I'd hope that the URLs
> were converted from UTF-16 to UTF-8 at some prior point ensuring UTF-8
> validity, but it's hard to be sure.


At least per the specification all URL components end up with only
ASCII code points after parsing the URL. I think we match that these
days, though for UI purposes we go back to Unicode at times. I don't
think we convert to Unicode if the percent-encoded sequences are not
valid UTF-8 byte sequences though.


> To the extent these are used for
> security checks, having NaN atoms that match nothing could be safer
> than having different inputs yield the same U+FFFD sequences to make
> them match.
>
> The query string can introduce invalid UTF-8 into a URL, but I believe
> we never do security checks based on query part. I believe we're
> supposed to be doing security checks on the scheme (always ASCII),
> port (number) and the Punycode form of the host (always ASCII). Is
> this true?

We do some important checks on other parts (e.g., what service worker
to use depends on the path), but again, I'd assume those are all full
ASCII comparisons.


-- 
https://annevankesteren.nl/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Use cases for invalid-Unicode atoms

Reply via email to