On Thu, 8 Jan 2026 13:24:57 GMT, Jaikiran Pai <[email protected]> wrote:
>> Can I please get a review of this change which proposes to address a >> performance regression in `java.util.zip.GZIPInputStream`? >> >> In JDK 23, we updated `GZIPInputStream` to no longer rely on >> `InputStream.available()` method when determining if a `InputStream` >> consists of additional (concatenated) GZIP streams. That was done in >> https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of >> `InputStream.available()` with explicit call(s) to `InputStream.read()` to >> detect and read any additional concatenated GZIP stream header. >> >> Given how old that existing code in `GZIPInputStream` was, efforts were made >> to ensure that it wouldn't break existing applications, for the common use >> case. A release note too was added in an effort to highlight this change. >> >> Unfortunately, when reviewing and experimenting with this change I didn't >> realize that when we try to detect additional bytes in the stream which has >> already reached EOF, the `EOFException` that gets raised by this code could >> contribute to a performance degradation. The code does handle this exception >> appropriately, but it does mean that in the common case of a stream formed >> of just one GZIP stream, we would now end up constructing an `EOFException` >> in this code path. >> >> Construction of `Throwable` instances involves filling the stacktrace into >> that instance and is a relatively expensive operation. As a result, we ended >> up regressing the performance of this code path for the common use case of >> parsing a non-concatenated GZIP stream. >> >> The commit in this PR addresses this issue by introducing an alternate code >> path in the (private) `readHeader()` method to prevent an `EOFException` >> from being generated when the stream has already reached EOF and we are >> trying to determine if there's any additional concatenated GZIP stream >> header in the stream. This addresses the performance regression as well as >> lets us retain the changes that we had done in JDK-7036144 to prevent the >> usage of `InputStream.available()`. >> >> The reporter of this performance regression provided a JMH benchmark. I've >> run it against JDK 22 (the version that doesn't contain the JDK-7036144 >> change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) >> and with this change proposed in this PR. I can see that with this change, >> the performance numbers improve and go back to the JDK 22 levels. I haven't >> included that JMH benchmark in this PR and am looking for inputs on whether >> we should include it. >> >> No n... > > Jaikiran Pai has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains six additional > commits since the last revision: > > - merge latest from master branch > - add a comment explaining the "true" value being passed to readHeader > - review suggestion - assert not needed for now > - add a comment to clarify the short value being read for GZIP magic header > - use -1 to represent absence of a GZIP header, from readHeader() method > - 8374644: Regression in GZIPInputStream performance after JDK-7036144 Thank you for the additional updates Jai. ------------- Marked as reviewed by lancea (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29092#pullrequestreview-3639422345
