On Wed, Mar 21, 2018 at 10:27 AM, Henri Sivonen <hsivo...@hsivonen.fi> wrote: > * A bunch of things atomicize URL components. I'd hope that the URLs > were converted from UTF-16 to UTF-8 at some prior point ensuring UTF-8 > validity, but it's hard to be sure.
At least per the specification all URL components end up with only ASCII code points after parsing the URL. I think we match that these days, though for UI purposes we go back to Unicode at times. I don't think we convert to Unicode if the percent-encoded sequences are not valid UTF-8 byte sequences though. > To the extent these are used for > security checks, having NaN atoms that match nothing could be safer > than having different inputs yield the same U+FFFD sequences to make > them match. > > The query string can introduce invalid UTF-8 into a URL, but I believe > we never do security checks based on query part. I believe we're > supposed to be doing security checks on the scheme (always ASCII), > port (number) and the Punycode form of the host (always ASCII). Is > this true? We do some important checks on other parts (e.g., what service worker to use depends on the path), but again, I'd assume those are all full ASCII comparisons. -- https://annevankesteren.nl/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform