> Can I please get a review of this change which proposes to address a > performance regression in `java.util.zip.GZIPInputStream`? > > In JDK 23, we updated `GZIPInputStream` to no longer rely on > `InputStream.available()` method when determining if a `InputStream` consists > of additional (concatenated) GZIP streams. That was done in > https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of > `InputStream.available()` with explicit call(s) to `InputStream.read()` to > detect and read any additional concatenated GZIP stream header. > > Given how old that existing code in `GZIPInputStream` was, efforts were made > to ensure that it wouldn't break existing applications, for the common use > case. A release note too was added in an effort to highlight this change. > > Unfortunately, when reviewing and experimenting with this change I didn't > realize that when we try to detect additional bytes in the stream which has > already reached EOF, the `EOFException` that gets raised by this code could > contribute to a performance degradation. The code does handle this exception > appropriately, but it does mean that in the common case of a stream formed of > just one GZIP stream, we would now end up constructing an `EOFException` in > this code path. > > Construction of `Throwable` instances involves filling the stacktrace into > that instance and is a relatively expensive operation. As a result, we ended > up regressing the performance of this code path for the common use case of > parsing a non-concatenated GZIP stream. > > The commit in this PR addresses this issue by introducing an alternate code > path in the (private) `readHeader()` method to prevent an `EOFException` from > being generated when the stream has already reached EOF and we are trying to > determine if there's any additional concatenated GZIP stream header in the > stream. This addresses the performance regression as well as lets us retain > the changes that we had done in JDK-7036144 to prevent the usage of > `InputStream.available()`. > > The reporter of this performance regression provided a JMH benchmark. I've > run it against JDK 22 (the version that doesn't contain the JDK-7036144 > change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) > and with this change proposed in this PR. I can see that with this change, > the performance numbers improve and go back to the JDK 22 levels. I haven't > included that JMH benchmark in this PR and am looking for inputs on whether > we should include it. > > No new regression test has been in...
Jaikiran Pai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - merge latest from master branch - add a comment explaining the "true" value being passed to readHeader - review suggestion - assert not needed for now - add a comment to clarify the short value being read for GZIP magic header - use -1 to represent absence of a GZIP header, from readHeader() method - 8374644: Regression in GZIPInputStream performance after JDK-7036144 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29092/files - new: https://git.openjdk.org/jdk/pull/29092/files/06e7b370..e0124326 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29092&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29092&range=01-02 Stats: 971 lines in 90 files changed: 429 ins; 247 del; 295 mod Patch: https://git.openjdk.org/jdk/pull/29092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29092/head:pull/29092 PR: https://git.openjdk.org/jdk/pull/29092
