Re: RFR: 8311216: DataURI can lose information in some charset environments

Andy Goryachev Fri, 07 Jul 2023 13:53:02 -0700

On Sat, 1 Jul 2023 22:24:09 GMT, Michael Strauß <[email protected]> wrote:


> DataURI uses the following implementation to decode the percent-encoded 
> payload of a "data" URI:
> 
> 
> ...
> String data = uri.substring(dataSeparator + 1);
> Charset charset = Charset.defaultCharset();
> ...
> URLDecoder.decode(data.replace("+", "%2B"), charset).getBytes(charset)
> 
> 
> This approach only works if the charset that is passed into 
> `URLDecoder.decode` and `String.getBytes` doesn't lose information when 
> converting between `String` and `byte[]` representations, as might happen in 
> a US-ASCII environment.
> 
> This PR solves the problem by not using `URLDecoder`, but instead simply 
> decoding percent-encoded escape sequences as specified by RFC 3986, page 11.
> 
> **Note to reviewers**: the failing test can only be observed when the JVM 
> uses a default charset that can't represent the payload, which can be 
> enforced by specifying the `-Dfile.encoding=US-ASCII` VM option.

modules/javafx.graphics/src/test/java/test/com/sun/javafx/util/DataURITest.java 
line 183:

> 181:         // We use URLEncoder here to escape the emoji character using 
> percent-encoding.
> 182:         // When DataURI parses its payload, it automatically converts 
> percent-encoded characters back to octets.
> 183:         String input = URLEncoder.encode("🙂", StandardCharsets.UTF_8);

would it make sense to try several different strings that include +, \n, \t, 
data:, charset:, %, empty string, &, _, %zz?

modules/javafx.graphics/src/test/java/test/com/sun/javafx/util/DataURITest.java 
line 203:

> 201: 
> 202:         ex = assertThrows(IllegalArgumentException.class, () -> 
> DataURI.tryParse("data:,%0"));
> 203:         assertTrue(ex.getMessage().startsWith("Incomplete"));

"%", "", null ?

-------------

PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1256453599
PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1256455821

Re: RFR: 8311216: DataURI can lose information in some charset environments

Reply via email to