On Wed, 28 Aug 2024 21:56:50 GMT, Archie Cobbs <aco...@openjdk.org> wrote:
>> `GZIPInputStream` supports reading data from multiple concatenated GZIP data >> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In >> order to do this, after a GZIP trailer frame is read, it attempts to read a >> GZIP header frame and, if successful, proceeds onward to decompress the new >> stream. If the attempt to decode a GZIP header frame fails, or happens to >> trigger an `IOException`, it just ignores the trailing garbage and/or the >> `IOException` and returns EOF. >> >> There are several issues with this: >> >> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring >> trailing garbage are not documented, much less precisely specified. >> 2. Ignoring trailing garbage is dubious because it could easily hide errors >> or other data corruption that an application would rather be notified about. >> Moreover, the API claims that a `ZipException` will be thrown when corrupt >> data is read, but obviously that doesn't happen in the trailing garbage >> scenario (so N concatenated streams where the last one has a corrupted >> header frame is indistinguishable from N-1 valid streams). >> 3. There's no way to create a `GZIPInputStream` that does _not_ support >> stream concatenation. >> >> On the other hand, `GZIPInputStream` is an old class with lots of existing >> usage, so it's important to preserve the existing behavior, warts and all >> (note: my the definition of "existing behavior" here includes the bug fix in >> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)). >> >> So this patch adds a new constructor that takes two new parameters >> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is >> enabled by setting both to true; otherwise, they do what they sound like. In >> particular, when `allowTrailingGarbage` is false, then the underlying input >> must contain exactly one (if `allowConcatenation` is false) or exactly N (if >> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an >> exception is guaranteed. > > Archie Cobbs has updated the pull request incrementally with one additional > commit since the last revision: > > Shorten the description of concatenation behavior per review comments. I think the `testScenario` method would be easier to read if we dropped the assigments to the `input` and `output` variables, and if the `testDecomp` method was split into separate methods for expecting a successful vs. failing deompression. See comments for details. test/jdk/java/util/zip/GZIP/GZIPInputStreamConcat.java line 48: > 46: // Test concat vs. non-concat, garbage vs. no-garbage, and > various buffer sizes on random data > 47: Random random = new Random(); > 48: final ArrayList<List<Object>> scenarios = new ArrayList<>(); Suggestion: List<List<Object>> scenarios = new ArrayList<>(); test/jdk/java/util/zip/GZIP/GZIPInputStreamConcat.java line 49: > 47: Random random = new Random(); > 48: final ArrayList<List<Object>> scenarios = new ArrayList<>(); > 49: for (int size = 1; size < 1024; size += random.nextInt(32) + 1) { It's not obviously clear that `size` is the buffer size passed to GZIPInputStream here and not related to the size of the uncompressed input data. Perhaps rename this to `bufferSize` and/or document the parameters via a comment? `// Parameters: byte[] data, int bufferSize` test/jdk/java/util/zip/GZIP/GZIPInputStreamConcat.java line 69: > 67: // Decompress a single stream with no extra garbage - should > always work > 68: byte[] input = compressed; > 69: byte[] output = uncompressed; The assignment of `input` and `output` variables in this method add a lot of noise. Could this be simplified to something like this? // Compress the test data byte[] compressed = deflate(uncompressed); // Decompress a single stream with no extra garbage - should always work testDecomp(compressed, uncompressed); // Decompress a truncated GZIP header testDecompFail(oneByteShort(gzipHeader()), EOFException.class); // Decompress a single stream that is one byte short - should always fail testDecompFail(oneByteShort(compressed), EOFException.class); // Decompress a single stream with one byte of extra garbage (trying all 256 possible values) for (int extra = 0; extra < 0x100; extra++) { testDecomp(oneByteLong(compressed, extra), uncompressed); } // Decompress a single stream followed by a truncated GZIP header testDecomp(concat(compressed, oneByteShort(gzipHeader())), uncompressed); // Decompress a single stream followed by another stream that is one byte short testDecompFail(concat(compressed, oneByteShort(compressed)), IOException.class); // Decompress two streams concatenated testDecomp(concat(compressed, compressed), concat(uncompressed, uncompressed)); // Decompress three streams concatenated testDecomp(concat(compressed, compressed, compressed), concat(uncompressed, uncompressed, uncompressed)); // Decompress three streams concatenated followed by a truncated GZIP header testDecomp(concat(compressed, compressed, compressed, oneByteShort(gzipHeader())), concat(uncompressed, uncompressed, uncompressed)); test/jdk/java/util/zip/GZIP/GZIPInputStreamConcat.java line 142: > 140: System.arraycopy(array, 0, array2, 0, array.length); > 141: array2[array.length] = (byte)value; > 142: return array2; Would it make sense to do the following here? Suggestion: return concat(array, new byte[] {(byte) value}); test/jdk/java/util/zip/GZIP/GZIPInputStreamConcat.java line 167: > 165: > 166: // GZIP compress data > 167: public static byte[] deflate(byte[] data) throws IOException { The returned data is in the GZIP format, not the DEFLATE format. So could be `gzip` rather than `deflate`. test/jdk/java/util/zip/GZIP/GZIPInputStreamConcat.java line 176: > 174: > 175: // GZIP decompress data > 176: public byte[] inflate(InputStream in) throws IOException { The stream here is in the GZIP format, not the DEFLATE format. So could be `gunzip` rather than `inflate`. ------------- PR Review: https://git.openjdk.org/jdk/pull/18385#pullrequestreview-2268596523 PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1736128032 PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1736160530 PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1736229023 PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1736125278 PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1736119267 PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1736120221