On Wed, 28 Aug 2024 21:56:50 GMT, Archie Cobbs <aco...@openjdk.org> wrote:
>> `GZIPInputStream` supports reading data from multiple concatenated GZIP data >> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In >> order to do this, after a GZIP trailer frame is read, it attempts to read a >> GZIP header frame and, if successful, proceeds onward to decompress the new >> stream. If the attempt to decode a GZIP header frame fails, or happens to >> trigger an `IOException`, it just ignores the trailing garbage and/or the >> `IOException` and returns EOF. >> >> There are several issues with this: >> >> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring >> trailing garbage are not documented, much less precisely specified. >> 2. Ignoring trailing garbage is dubious because it could easily hide errors >> or other data corruption that an application would rather be notified about. >> Moreover, the API claims that a `ZipException` will be thrown when corrupt >> data is read, but obviously that doesn't happen in the trailing garbage >> scenario (so N concatenated streams where the last one has a corrupted >> header frame is indistinguishable from N-1 valid streams). >> 3. There's no way to create a `GZIPInputStream` that does _not_ support >> stream concatenation. >> >> On the other hand, `GZIPInputStream` is an old class with lots of existing >> usage, so it's important to preserve the existing behavior, warts and all >> (note: my the definition of "existing behavior" here includes the bug fix in >> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)). >> >> So this patch adds a new constructor that takes two new parameters >> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is >> enabled by setting both to true; otherwise, they do what they sound like. In >> particular, when `allowTrailingGarbage` is false, then the underlying input >> must contain exactly one (if `allowConcatenation` is false) or exactly N (if >> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an >> exception is guaranteed. > > Archie Cobbs has updated the pull request incrementally with one additional > commit since the last revision: > > Shorten the description of concatenation behavior per review comments. The parameter source methods in both tests could benefit from using the `Arguments` class in JUnit, see comments for details. test/jdk/java/util/zip/GZIP/GZIPInputStreamConcat.java line 44: > 42: private int bufsize; > 43: > 44: public static Stream<Object[]> testScenarios() throws IOException { I think the best practice for JUnit 5 when providing more than one argument is to return a `Stream<Arguments>`. (IntelliJ IDEA also suggests this in a code inspection) So, something like: public static Stream<Arguments> testScenarios() throws IOException { // Test concat vs. non-concat, garbage vs. no-garbage, and various buffer sizes on random data Random random = new Random(); List<Arguments> scenarios = new ArrayList<>(); for (int bufsize = 1; bufsize < 1024; bufsize += random.nextInt(32) + 1) { scenarios.add(Arguments.of(randomData(0, 100), bufsize)); } return scenarios.stream(); } test/jdk/java/util/zip/GZIP/GZIPInputStreamGzipCommand.java line 44: > 42: > 43: public static Stream<String[]> gzipScenarios() throws IOException { > 44: final ArrayList<String[]> scenarios = new ArrayList(); Similar to the comment for the other test, this could be: return Stream.of(Arguments.of(expected, hex), Arguments.of(expected, hex), ...); test/jdk/java/util/zip/GZIP/GZIPInputStreamGzipCommand.java line 122: > 120: > 121: // Get expected result > 122: final byte[] expected = input.getBytes(StandardCharsets.UTF_8); IMHO the use of `final` for local variables in this method adds visual clutter without providing much value. ------------- PR Review: https://git.openjdk.org/jdk/pull/18385#pullrequestreview-2269041615 PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1736385638 PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1736394966 PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1736428169