> Can I please get a review of this change which proposes to address a 
> performance regression in `java.util.zip.GZIPInputStream`? 
> 
> In JDK 23, we updated `GZIPInputStream` to no longer rely on 
> `InputStream.available()` method when determining if a `InputStream` consists 
> of additional (concatenated) GZIP streams. That was done in 
> https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of 
> `InputStream.available()` with explicit call(s) to `InputStream.read()` to 
> detect and read any additional concatenated GZIP stream header.
> 
> Given how old that existing code in `GZIPInputStream` was, efforts were made 
> to ensure that it wouldn't break existing applications, for the common use 
> case. A release note too was added in an effort to highlight this change.
> 
> Unfortunately, when reviewing and experimenting with this change I didn't 
> realize that when we try to detect additional bytes in the stream which has 
> already reached EOF, the `EOFException` that gets raised by this code could 
> contribute to a performance degradation. The code does handle this exception 
> appropriately, but it does mean that in the common case of a stream formed of 
> just one GZIP stream, we would now end up constructing an `EOFException` in 
> this code path.
> 
> Construction of `Throwable` instances involves filling the stacktrace into 
> that instance and is a relatively expensive operation. As a result, we ended 
> up regressing the performance of this code path for the common use case of 
> parsing a non-concatenated GZIP stream.
> 
> The commit in this PR addresses this issue by introducing an alternate code 
> path in the (private) `readHeader()` method to prevent an `EOFException` from 
> being generated when the stream has already reached EOF and we are trying to 
> determine if there's any additional concatenated GZIP stream header in the 
> stream. This addresses the performance regression as well as lets us retain 
> the changes that we had done in JDK-7036144 to prevent the usage of 
> `InputStream.available()`.
> 
> The reporter of this performance regression provided a JMH benchmark. I've 
> run it against JDK 22 (the version that doesn't contain the JDK-7036144 
> change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) 
> and with this change proposed in this PR. I can see that with this change, 
> the performance numbers improve and go back to the JDK 22 levels. I haven't 
> included that JMH benchmark in this PR and am looking for inputs on whether 
> we should include it.
> 
> No new regression test has been in...

Jaikiran Pai has updated the pull request with a new target base due to a merge 
or a rebase. The incremental webrev excludes the unrelated changes brought in 
by the merge/rebase. The pull request contains six additional commits since the 
last revision:

 - merge latest from master branch
 - add a comment explaining the "true" value being passed to readHeader
 - review suggestion - assert not needed for now
 - add a comment to clarify the short value being read for GZIP magic header
 - use -1 to represent absence of a GZIP header, from readHeader() method
 - 8374644: Regression in GZIPInputStream performance after JDK-7036144

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/29092/files
  - new: https://git.openjdk.org/jdk/pull/29092/files/06e7b370..e0124326

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=29092&range=02
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29092&range=01-02

  Stats: 971 lines in 90 files changed: 429 ins; 247 del; 295 mod
  Patch: https://git.openjdk.org/jdk/pull/29092.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/29092/head:pull/29092

PR: https://git.openjdk.org/jdk/pull/29092

Reply via email to