On Fri, 26 Jul 2024 16:12:11 GMT, Jaikiran Pai <j...@openjdk.org> wrote:
>> Archie Cobbs has updated the pull request with a new target base due to a >> merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8322256 >> - Wording tweak. >> - Change concatenation policy config from booleans to new ConcatPolicy enum. >> - Merge branch 'master' into JDK-8322256 >> - Bump @since from 23 to 24. >> - Merge branch 'master' into JDK-8322256 >> - Relabel "trailing garbage" as "extra bytes" to sound less accusatory. >> - Simplify code by eliminating an impossible case. >> - Field name change and Javadoc wording tweaks. >> - Merge branch 'master' into JDK-8322256 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/332df83e...72b60092 > > Hello Archie, I don't think we need any enum in the API. GZIPInputStream > class should only allow parsing GZIP streams. The InputStream passed to the > GZIPInputStream constructor must be a InputStream which has either 1 GZIP > stream or N GZIP streams (concatenated). The user should be allowed to state > whether the passed InputStream is expected contain more than 1 GZIP stream. > This is the only addition that I think we should be introducing. > > The way to allow users to state that the InputStream being passed is allowed > to have more than 1 GZIP stream is through the new proposed boolean to the > constructor. The boolean can be named "allowConcatenated". > > When allowConcatenated is true, then during the read() operation on the > GZIPInputStream, when we internally read a GZIP stream trailer, we should > check if the underlying InputStream has a next GZIP stream header: > - If there's no bytes present after the trailer, we just return > whatever data we have inflated so far and that also would be the > end-of-stream of the GZIPInputStream. > - If there's bytes after the trailer, then those next bytes MUST be the > next GZIP stream header. If those bytes aren't a GZIP stream header, then we > throw an IOException. This implies whatever data we had deflated and any > additional bytes that we had read after the trailer will not be returned to > the user. That's fine because the underlying InputStream was "corrupt" - the > GZIPInputStream shouldn't expect any other data other than GZIP stream(s) in > the underlying InputStream. > > When allowConcatenated is false, then during the read() operation on the > GZIPInputStream, when we internally read a GZIP stream trailer, we should > check if the underlying InputStream has any more bytes remaining: > - If there are bytes remaining then we consider that an error and raise > an IOException. We do this because we were instructed not to allow > concatenated GZIP streams. So presence of any data after the first GZIP > stream trailer should be an exception. > - If there are no more bytes remaining then we have reached the > end-of-stream and we return back whatever GZIP data we have inflated during > the read operation and that also would be the end-of-stream of the GZIPStream > instance. > > The default value for allowConcatenated should be true to allow for > supporting concatenated GZIP streams (like we do now). There however will be > a change in behaviour if the stream content after the trailer isn't a GZIP > stream. With this new proposed change we will throw an IOException ... Hi @jaikiran, > There however will be a change in behaviour if the stream content after the > trailer isn't a GZIP stream. With this new proposed change we will throw an > IOException (unlike previously). I think that's fine and in fact the correct > thing to do. Ah! This simplifies everything. I was just assuming that preserving the existing behavior was a hard requirement - but if not, we can make this class "always precise", and the only thing left to configure is whether to allow 1 stream or N streams. I am in complete agreement with you now, especially because this eliminates the aforementioned security concern. I'm assuming you and @LanceAndersen have communicated and he is OK with this as well. I will update the PR and CSR. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/18385#issuecomment-2253125776