On Sat, 27 Jul 2024 15:00:51 GMT, Archie Cobbs <aco...@openjdk.org> wrote:

>> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
>> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
>> order to do this, after a GZIP trailer frame is read, it attempts to read a 
>> GZIP header frame and, if successful, proceeds onward to decompress the new 
>> stream. If the attempt to decode a GZIP header frame fails, or happens to 
>> trigger an `IOException`, it just ignores the trailing garbage and/or the 
>> `IOException` and returns EOF.
>> 
>> There are several issues with this:
>> 
>> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
>> trailing garbage are not documented, much less precisely specified.
>> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
>> or other data corruption that an application would rather be notified about. 
>> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
>> data is read, but obviously that doesn't happen in the trailing garbage 
>> scenario (so N concatenated streams where the last one has a corrupted 
>> header frame is indistinguishable from N-1 valid streams).
>> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
>> stream concatenation.
>> 
>> On the other hand, `GZIPInputStream` is an old class with lots of existing 
>> usage, so it's important to preserve the existing behavior, warts and all 
>> (note: my the definition of "existing behavior" here includes the bug fix in 
>> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
>> 
>> So this patch adds a new constructor that takes two new parameters 
>> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
>> enabled by setting both to true; otherwise, they do what they sound like. In 
>> particular, when `allowTrailingGarbage` is false, then the underlying input 
>> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
>> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
>> exception is guaranteed.
>
> Archie Cobbs has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - Refactor to eliminate "lenient" mode (still failng test: GZIPInZip.java).
>  - Add GZIP input test for various gzip(1) command outputs.

Hello Archie,

> That sounds reasonable. Importantly it doesn't specify the swallowing of 
> IOException's, which means hopefully someday we can stop doing that...
>
> Would you guys recommend including some kind of additional statement like 
> this?
> If an IOException is thrown while attempting to read or decode a GZIP header 
> following a previous GZIP stream's trailer...

We should leave out any mention of IOException from the javadoc specification. 
That will then make this an implementation detail and like you note, it will 
then allow us to decide in future how we want to deal with that specific part 
of the code.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2282732551

Reply via email to