On Thu, 8 Jan 2026 13:24:57 GMT, Jaikiran Pai <[email protected]> wrote:

>> Can I please get a review of this change which proposes to address a 
>> performance regression in `java.util.zip.GZIPInputStream`? 
>> 
>> In JDK 23, we updated `GZIPInputStream` to no longer rely on 
>> `InputStream.available()` method when determining if a `InputStream` 
>> consists of additional (concatenated) GZIP streams. That was done in 
>> https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of 
>> `InputStream.available()` with explicit call(s) to `InputStream.read()` to 
>> detect and read any additional concatenated GZIP stream header.
>> 
>> Given how old that existing code in `GZIPInputStream` was, efforts were made 
>> to ensure that it wouldn't break existing applications, for the common use 
>> case. A release note too was added in an effort to highlight this change.
>> 
>> Unfortunately, when reviewing and experimenting with this change I didn't 
>> realize that when we try to detect additional bytes in the stream which has 
>> already reached EOF, the `EOFException` that gets raised by this code could 
>> contribute to a performance degradation. The code does handle this exception 
>> appropriately, but it does mean that in the common case of a stream formed 
>> of just one GZIP stream, we would now end up constructing an `EOFException` 
>> in this code path.
>> 
>> Construction of `Throwable` instances involves filling the stacktrace into 
>> that instance and is a relatively expensive operation. As a result, we ended 
>> up regressing the performance of this code path for the common use case of 
>> parsing a non-concatenated GZIP stream.
>> 
>> The commit in this PR addresses this issue by introducing an alternate code 
>> path in the (private) `readHeader()` method to prevent an `EOFException` 
>> from being generated when the stream has already reached EOF and we are 
>> trying to determine if there's any additional concatenated GZIP stream 
>> header in the stream. This addresses the performance regression as well as 
>> lets us retain the changes that we had done in JDK-7036144 to prevent the 
>> usage of `InputStream.available()`.
>> 
>> The reporter of this performance regression provided a JMH benchmark. I've 
>> run it against JDK 22 (the version that doesn't contain the JDK-7036144 
>> change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) 
>> and with this change proposed in this PR. I can see that with this change, 
>> the performance numbers improve and go back to the JDK 22 levels. I haven't 
>> included that JMH benchmark in this PR and am looking for inputs on whether 
>> we should include it.
>> 
>> No n...
>
> Jaikiran Pai has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains six additional 
> commits since the last revision:
> 
>  - merge latest from master branch
>  - add a comment explaining the "true" value being passed to readHeader
>  - review suggestion - assert not needed for now
>  - add a comment to clarify the short value being read for GZIP magic header
>  - use -1 to represent absence of a GZIP header, from readHeader() method
>  - 8374644: Regression in GZIPInputStream performance after JDK-7036144

Thank you Alan and Lance for the reviews. tier1, tier2 and tier3 testing 
completed without issues. I'll go ahead and integrate this now.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/29092#issuecomment-3727396701

Reply via email to