On Fri, 7 Jul 2023 19:16:20 GMT, Andy Goryachev <ango...@openjdk.org> wrote:
>> DataURI uses the following implementation to decode the percent-encoded >> payload of a "data" URI: >> >> >> ... >> String data = uri.substring(dataSeparator + 1); >> Charset charset = Charset.defaultCharset(); >> ... >> URLDecoder.decode(data.replace("+", "%2B"), charset).getBytes(charset) >> >> >> This approach only works if the charset that is passed into >> `URLDecoder.decode` and `String.getBytes` doesn't lose information when >> converting between `String` and `byte[]` representations, as might happen in >> a US-ASCII environment. >> >> This PR solves the problem by not using `URLDecoder`, but instead simply >> decoding percent-encoded escape sequences as specified by RFC 3986, page 11. >> >> **Note to reviewers**: the failing test can only be observed when the JVM >> uses a default charset that can't represent the payload, which can be >> enforced by specifying the `-Dfile.encoding=US-ASCII` VM option. > > modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line > 115: > >> 113: nameValuePairs, >> 114: base64, >> 115: base64 ? Base64.getDecoder().decode(data) : >> decodePercentEncoding(data)); > > I wonder if this is all necessary. The data is supposed to be url-encoded, > so it's essentially ASCII, no? > > passing default charset to getBytes() is not right, it probably should be > > URLDecoder.decode(data.replace("+", "%2B"), > charset).getBytes(StandardCharsets.US_ASCII)); > > or am I missing something? >From https://datatracker.ietf.org/doc/html/rfc3986#page-11 Therefore, the Berners-Lee, et al. Standards Track [Page 11] [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986) URI Generic Syntax January 2005 integer values used by the ABNF must be mapped back to their corresponding characters via US-ASCII in order to complete the syntax rules. ------------- PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1256344029