On Wed, 30 Apr 2025 14:14:28 GMT, Jaikiran Pai <j...@openjdk.org> wrote:

>> Can I please get a review of this change which proposes to fix an issue 
>> `java.util.zip.ZipFile` which would cause failures when multiple instances 
>> of `ZipFile` using non-UTF8 `Charset` were operating against the same 
>> underlying ZIP file? This addresses 
>> https://bugs.openjdk.org/browse/JDK-8347712.
>> 
>> ZIP file specification allows for ZIP entries to mark a `UTF-8` flag to 
>> indicate that the entry name and comment are encoded using UTF8. A 
>> `java.util.zip.ZipFile` can be constructed by passing it a `Charset`. This 
>> `Charset` (which defaults to UTF-8) gets used for decoding entry names and 
>> comments for non-UTF8 entries.
>> 
>> The internal implementation of `ZipFile` uses a `ZipCoder` (backed by 
>> `java.nio.charset.CharsetEncoder/CharsetDecoder` instance) for the given 
>> `Charset`. Except for UTF8 `ZipCoder`, other `ZipCoder`s are not thread safe.
>> 
>> The internal implementation of `ZipFile` maintains a cache of 
>> `ZipFile$Source`. A `Source` corresponds to the underlying ZIP file and 
>> during construction, uses a `ZipCoder` for parsing the ZIP entries and once 
>> constructed holds on to the parsed ZIP structure. Multiple instances of a 
>> `ZipFile` which all correspond to the same ZIP file on the filesystem, share 
>> a single instance of `Source` (after the `Source` has been constructed and 
>> cached). Although `ZipFile` instances aren't expected to be thread-safe, the 
>> fact that multiple different instances of `ZipFile` could be sharing the 
>> same instance of `Source` in concurrent threads, mandates that the `Source` 
>> must be thread-safe.
>> 
>> In Java 15, we did a performance optimization through 
>> https://bugs.openjdk.org/browse/JDK-8243469. As part of that change, we 
>> started holding on to the `ZipCoder` instance (corresponding to the 
>> `Charset` provided during `ZipFile` construction) in the `Source`. This 
>> stored `ZipCoder` was then used for `ZipFile` operations when working with 
>> the ZIP entries. As noted previously, any non-UTF8 `ZipCoder` is not 
>> thread-safe and as a result, any usages of `ZipCoder` in the `Source` makes 
>> `Source` not thread-safe too. That effectively violates the requirement that 
>> `Source` must be thread-safe to allow for its usage in multiple different 
>> `ZipFile` instances concurrently. This then causes `ZipFile` usages to fail 
>> in unexpected ways like the one shown in the linked 
>> https://bugs.openjdk.org/browse/JDK-8347712.
>> 
>> The commit in this PR addresses the issue by not maintaining `ZipCoder` as a 
>> instance field of `Source`. Instead the `ZipCoder` is now mainta...
>
> Jaikiran Pai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Lance's review - update code comment in the test

Hello Claes, thank you for the review.

> While this PR fixes the bug reported in JDK-8355975 I'm not sure how the 
> changes in JDK-8243254 are to blame for that particular bug. This since no 
> aspect of the charset was part of Key before. 

You are right and I've updated the JBS issue to correct myself that 8355975 
isn't the cause for this and it likely exists even in older versions. I 
overlooked that even though 8355975 introduced the charset to be part of the 
`Key`, the underlying issue relates to considering two `ZipFile` instances 
opened with different `Charset`s as having the same entry names.

> The provided test case fails in an unrelated way (unmappable character) when 
> I try building and testing on JDK 8 and 11, though. It would be good to make 
> sure this is not an issue on older JDKs?

I will take a look at why it fails with that error on those versions and see if 
the test can be adjusted (outside of this PR) for those releases.

Thank you everyone for the reviews and the inputs. Alan had started reviewing 
this last week, so I'll wait for his review before integrating this.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/23986#issuecomment-2854182588

Reply via email to