On Sat, 27 Jul 2024 15:00:51 GMT, Archie Cobbs <aco...@openjdk.org> wrote:
>> `GZIPInputStream` supports reading data from multiple concatenated GZIP data >> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In >> order to do this, after a GZIP trailer frame is read, it attempts to read a >> GZIP header frame and, if successful, proceeds onward to decompress the new >> stream. If the attempt to decode a GZIP header frame fails, or happens to >> trigger an `IOException`, it just ignores the trailing garbage and/or the >> `IOException` and returns EOF. >> >> There are several issues with this: >> >> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring >> trailing garbage are not documented, much less precisely specified. >> 2. Ignoring trailing garbage is dubious because it could easily hide errors >> or other data corruption that an application would rather be notified about. >> Moreover, the API claims that a `ZipException` will be thrown when corrupt >> data is read, but obviously that doesn't happen in the trailing garbage >> scenario (so N concatenated streams where the last one has a corrupted >> header frame is indistinguishable from N-1 valid streams). >> 3. There's no way to create a `GZIPInputStream` that does _not_ support >> stream concatenation. >> >> On the other hand, `GZIPInputStream` is an old class with lots of existing >> usage, so it's important to preserve the existing behavior, warts and all >> (note: my the definition of "existing behavior" here includes the bug fix in >> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)). >> >> So this patch adds a new constructor that takes two new parameters >> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is >> enabled by setting both to true; otherwise, they do what they sound like. In >> particular, when `allowTrailingGarbage` is false, then the underlying input >> must contain exactly one (if `allowConcatenation` is false) or exactly N (if >> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an >> exception is guaranteed. > > Archie Cobbs has updated the pull request incrementally with two additional > commits since the last revision: > > - Refactor to eliminate "lenient" mode (still failng test: GZIPInZip.java). > - Add GZIP input test for various gzip(1) command outputs. Consider the following simple gzip test gz % cat batman.txt I am batman gz % cat hello.txt hello % cat robin.txt Robin Here gz % cat Joker.txt I am the Joker gz % gzip -c batman.txt hello.txt >> bats.gz gz % gunzip -c bats I am batman hello gz % cp bats.gz joker.gz gz % cat Joker.txt >> joker.gz gz % gunzip -c joker I am batman hello gunzip: joker.gz: trailing garbage ignored gz % cp joker.gz badRobin.gz gz % gzip -c robin.txt >> badRobin.gz gz % gunzip -c badRobin I am batman hello gunzip: badRobin.gz: trailing garbage ignored Here is the hexdump of the badRobin.gz gz % hexdump -C badRobin.gz 00000000 1f 8b 08 08 81 13 a9 66 00 03 62 61 74 6d 61 6e |.......f..batman| 00000010 2e 74 78 74 00 f3 54 48 cc 55 48 4a 2c c9 4d cc |.txt..TH.UHJ,.M.| 00000020 e3 02 00 f0 49 dd 72 0c 00 00 00 1f 8b 08 08 8f |....I.r.........| 00000030 13 a9 66 00 03 68 65 6c 6c 6f 2e 74 78 74 00 cb |..f..hello.txt..| 00000040 48 cd c9 c9 e7 02 00 20 30 3a 36 06 00 00 00 49 |H...... 0:6....I| 00000050 20 61 6d 20 74 68 65 20 4a 6f 6b 65 72 0a 1f 8b | am the Joker...| 00000060 08 08 66 0f a9 66 00 03 72 6f 62 69 6e 2e 74 78 |..f..f..robin.tx| 00000070 74 00 0b ca 4f ca cc 53 f0 48 2d 4a e5 02 00 b3 |t...O..S.H-J....| 00000080 74 fc c1 0b 00 00 00 |t......| 00000087 The following program will use GZIPInputStream and GzipCompressorinputStream to access badRobin.gz package gzip; import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; import org.testng.annotations.Test; import java.io.FileInputStream; import java.io.IOException; import java.util.zip.GZIPInputStream; public class GZipInputStreamTest { @Test public void openWithGZipInputStream() throws IOException { String f = "gz/badRobin.gz"; try (var is = new FileInputStream(f); var gis = new GZIPInputStream(is)) { var bytes = gis.readAllBytes(); System.out.printf("contents: %s%n", new String(bytes)); } } @Test public void openWithGzipCompressorInputStream() throws IOException { String f = "gz/badRobin.gz";; try (var is = new FileInputStream(f); var giz = new GzipCompressorInputStream(is, true)) { var bytes = giz.readAllBytes(); System.out.printf("contents: %s%n", new String(bytes)); } } } The output from openWithGZipInputStream(): contents: I am batman hello The output from openWithGzipCompressorInputStream(): java.io.IOException: Garbage after a valid .gz stream at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.init(GzipCompressorInputStream.java:193) at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:356) at java.base/java.io.InputStream.readNBytes(InputStream.java:412) at java.base/java.io.InputStream.readAllBytes(InputStream.java:349) at gzip.GZipInputStreamTest.openWithGzipCompressorInputStream(GZipInputStreamTest.java:31) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132) at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599) at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174) at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46) at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822) at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128) at java.base/java.util.ArrayList.forEach(ArrayList.java:1597) at org.testng.TestRunner.privateRun(TestRunner.java:764) at org.testng.TestRunner.run(TestRunner.java:585) at org.testng.SuiteRunner.runTest(SuiteRunner.java:384) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337) at org.testng.SuiteRunner.run(SuiteRunner.java:286) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1218) at org.testng.TestNG.runSuitesLocally(TestNG.java:1140) at org.testng.TestNG.runSuites(TestNG.java:1069) at org.testng.TestNG.run(TestNG.java:1037) at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66) at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:105) The behavior of GZIPInputStream is in-line with gzip and gunzip where it stops processing after encountering bad data after the 8 byte end trailer containing the CRC32 and ISIZE fields . GzipCompressorInputStream throws an IOException without returning any data. WinZip on my Mac also opens badRobin.gz similar to gunzip, that is does not return an error. Based on the above, I am reluctant to change the current behavior given it appears to have been modeled after gzip/gunzip as well as WinZip. As far as providing a means to fail whenever there is bad data after the end trailer, I think we would need more input on the merits supporting this and how the behavior would be model if GZIPInputStream was enhanced. > Did you double check the unviewable issues? What does JDK-8023431 say? It is just a fix for an internal intermittent test run failure. No additional details ------------- PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2258868069