On Sat, 8 Jul 2023 23:38:05 GMT, Michael Strauß <mstra...@openjdk.org> wrote:
>> DataURI uses the following implementation to decode the percent-encoded >> payload of a "data" URI: >> >> >> ... >> String data = uri.substring(dataSeparator + 1); >> Charset charset = Charset.defaultCharset(); >> ... >> URLDecoder.decode(data.replace("+", "%2B"), charset).getBytes(charset) >> >> >> This approach only works if the charset that is passed into >> `URLDecoder.decode` and `String.getBytes` doesn't lose information when >> converting between `String` and `byte[]` representations, as might happen in >> a US-ASCII environment. >> >> This PR solves the problem by not using `URLDecoder`, but instead simply >> decoding percent-encoded escape sequences as specified by RFC 3986, page 11. >> >> **Note to reviewers**: the failing test can only be observed when the JVM >> uses a default charset that can't represent the payload, which can be >> enforced by specifying the `-Dfile.encoding=US-ASCII` VM option. > > Michael Strauß has updated the pull request incrementally with one additional > commit since the last revision: > > added more tests This change is probably a good move; not only is URLDecoder not doing the right thing, it's not that efficient when decoding a large amount of data (as it was intended for short URL's). modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line 210: > 208: > 209: private static byte[] decodePercentEncoding(String input) { > 210: try (var output = new ByteArrayOutputStream(input.length())) { If deemed important to optimize this a bit: This could allocate an array that's a factor 3 too large, which depending on how these URI's are used can become quite big (although for large amounts of data, base64 is recommended). I noticed that in the URLDecoder they limit the size of the buffer `new StringBuilder(numChars > 500 ? numChars / 2 : numChars);` -- perhaps you could do something similar (`new ByteArrayOutputStream(input.length() > 500 ? input.length() / 2 : input.length())`) Another alternative would be to calculate the exact size first (by counting `%`s) avoiding the need to reallocate the `byte[]` -- performance is (IMHO) likely to be better: - ByteArrayOutputStream method - allocate output array: zeroes n*3 bytes - decode: reads whole string once (n to n*3 reads) - decode: writes n bytes - reallocate: reads whole output once (n) - reallocate: writes n bytes - garbage created: n*3 bytes - max memory use: n*3 + n bytes Total read/writes: 4n + 5n - Count method - count: reads whole string once (n to n*3 reads) - allocate output array: zeroes n bytes - decode: reads whole string again (n to n*3 reads) - decode: writes n bytes - garbage created: none - max memory use: n bytes Total read/writes: 6n + 2n modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line 215: > 213: } catch (IOException ignored) { > 214: // can never happen for ByteArrayOutputStream > 215: return null; If this can never happen, I'd prefer to just wrap and throw the original exception, using either `IllegalStateException` or `AssertionError` (the latter being the one that is thrown from a failed `assert`). modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line 220: > 218: > 219: /** > 220: * Decodes percent-encoded text as specified by RFC 3986, page 11. minor: Suggestion: * Decodes percent-encoded text as specified by RFC 3986, section 2.1 modules/javafx.graphics/src/main/java/com/sun/javafx/util/DataURI.java line 224: > 222: */ > 223: private static void decodePercentEncodingToStream(String input, > OutputStream output) throws IOException { > 224: enum ParseState { minor (just clarification): Suggestion: enum ExpectedCharacter { ------------- Marked as reviewed by jhendrikx (Committer). PR Review: https://git.openjdk.org/jfx/pull/1165#pullrequestreview-1702702475 PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1375181167 PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1375169802 PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1375170235 PR Review Comment: https://git.openjdk.org/jfx/pull/1165#discussion_r1375171041