Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v7]

Archie Cobbs Fri, 26 Jul 2024 09:50:24 -0700

On Fri, 26 Jul 2024 16:12:11 GMT, Jaikiran Pai <j...@openjdk.org> wrote:


>> Archie Cobbs has updated the pull request with a new target base due to a 
>> merge or a rebase. The pull request now contains 16 commits:
>> 
>>  - Merge branch 'master' into JDK-8322256
>>  - Wording tweak.
>>  - Change concatenation policy config from booleans to new ConcatPolicy enum.
>>  - Merge branch 'master' into JDK-8322256
>>  - Bump @since from 23 to 24.
>>  - Merge branch 'master' into JDK-8322256
>>  - Relabel "trailing garbage" as "extra bytes" to sound less accusatory.
>>  - Simplify code by eliminating an impossible case.
>>  - Field name change and Javadoc wording tweaks.
>>  - Merge branch 'master' into JDK-8322256
>>  - ... and 6 more: https://git.openjdk.org/jdk/compare/332df83e...72b60092
>
> Hello Archie, I don't think we need any enum in the API. GZIPInputStream 
> class should only allow parsing GZIP streams. The InputStream passed to the 
> GZIPInputStream constructor must be a InputStream which has either 1 GZIP 
> stream or N GZIP streams (concatenated). The user should be allowed to state 
> whether the passed InputStream is expected contain more than 1 GZIP stream. 
> This is the only addition that I think we should be introducing. 
> 
> The way to allow users to state that the InputStream being passed is allowed 
> to have more than 1 GZIP stream is through the new proposed boolean to the 
> constructor. The boolean can be named "allowConcatenated".
> 
> When allowConcatenated is true, then during the read() operation on the 
> GZIPInputStream, when we internally read a GZIP stream trailer, we should 
> check if the underlying InputStream has a next GZIP stream header:
>       - If there's no bytes present after the trailer, we just return 
> whatever data we have inflated so far and that also would be the 
> end-of-stream of the GZIPInputStream. 
>       - If there's bytes after the trailer, then those next bytes MUST be the 
> next GZIP stream header. If those bytes aren't a GZIP stream header, then we 
> throw an IOException. This implies whatever data we had deflated and any 
> additional bytes that we had read after the trailer will not be returned to 
> the user. That's fine because the underlying InputStream was "corrupt" - the 
> GZIPInputStream shouldn't expect any other data other than GZIP stream(s) in 
> the underlying InputStream.
> 
> When allowConcatenated is false, then during the read() operation on the 
> GZIPInputStream, when we internally read a GZIP stream trailer, we should 
> check if the underlying InputStream has any more bytes remaining:
>       - If there are bytes remaining then we consider that an error and raise 
> an IOException. We do this because we were instructed not to allow 
> concatenated GZIP streams. So presence of any data after the first GZIP 
> stream trailer should be an exception.
>       - If there are no more bytes remaining then we have reached the 
> end-of-stream and we return back whatever GZIP data we have inflated during 
> the read operation and that also would be the end-of-stream of the GZIPStream 
> instance.
> 
> The default value for allowConcatenated should be true to allow for 
> supporting concatenated GZIP streams (like we do now). There however will be 
> a change in behaviour if the stream content after the trailer isn't a GZIP 
> stream. With this new proposed change we will throw an IOException ...

Hi @jaikiran,

> There however will be a change in behaviour if the stream content after the 
> trailer isn't a GZIP stream. With this new proposed change we will throw an 
> IOException (unlike previously). I think that's fine and in fact the correct 
> thing to do.

Ah! This simplifies everything. I was just assuming that preserving the 
existing behavior was a hard requirement - but if not, we can make this class 
"always precise", and the only thing left to configure is whether to allow 1 
stream or N streams. I am in complete agreement with you now, especially 
because this eliminates the aforementioned security concern.

I'm assuming you and @LanceAndersen have communicated and he is OK with this as 
well. I will update the PR and CSR.

Thanks!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2253125776

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v7]

Reply via email to