On Wed, 7 Jan 2026 13:27:43 GMT, Jaikiran Pai <[email protected]> wrote:

> Can I please get a review of this change which proposes to address a 
> performance regression in `java.util.zip.GZIPInputStream`? 
> 
> In JDK 23, we updated `GZIPInputStream` to no longer rely on 
> `InputStream.available()` method when determining if a `InputStream` consists 
> of additional (concatenated) GZIP streams. That was done in 
> https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of 
> `InputStream.available()` with explicit call(s) to `InputStream.read()` to 
> detect and read any additional concatenated GZIP stream header.
> 
> Given how old that existing code in `GZIPInputStream` was, efforts were made 
> to ensure that it wouldn't break existing applications, for the common use 
> case. A release note too was added in an effort to highlight this change.
> 
> Unfortunately, when reviewing and experimenting with this change I didn't 
> realize that when we try to detect additional bytes in the stream which has 
> already reached EOF, the `EOFException` that gets raised by this code could 
> contribute to a performance degradation. The code does handle this exception 
> appropriately, but it does mean that in the common case of a stream formed of 
> just one GZIP stream, we would now end up constructing an `EOFException` in 
> this code path.
> 
> Construction of `Throwable` instances involves filling the stacktrace into 
> that instance and is a relatively expensive operation. As a result, we ended 
> up regressing the performance of this code path for the common use case of 
> parsing a non-concatenated GZIP stream.
> 
> The commit in this PR addresses this issue by introducing an alternate code 
> path in the (private) `readHeader()` method to prevent an `EOFException` from 
> being generated when the stream has already reached EOF and we are trying to 
> determine if there's any additional concatenated GZIP stream header in the 
> stream. This addresses the performance regression as well as lets us retain 
> the changes that we had done in JDK-7036144 to prevent the usage of 
> `InputStream.available()`.
> 
> The reporter of this performance regression provided a JMH benchmark. I've 
> run it against JDK 22 (the version that doesn't contain the JDK-7036144 
> change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) 
> and with this change proposed in this PR. I can see that with this change, 
> the performance numbers improve and go back to the JDK 22 levels. I haven't 
> included that JMH benchmark in this PR and am looking for inputs on whether 
> we should include it.
> 
> No new regression test has been in...

This pull request has now been integrated.

Changeset: a4fb07ee
Author:    Jaikiran Pai <[email protected]>
URL:       
https://git.openjdk.org/jdk/commit/a4fb07ee3e26c2f0ed3111c39c3a22167d292d04
Stats:     49 lines in 1 file changed: 41 ins; 1 del; 7 mod

8374644: Regression in GZIPInputStream performance after JDK-7036144

Reviewed-by: lancea, alanb

-------------

PR: https://git.openjdk.org/jdk/pull/29092

Reply via email to