On Wed, 30 Apr 2025 14:14:28 GMT, Jaikiran Pai <j...@openjdk.org> wrote:
>> Can I please get a review of this change which proposes to fix an issue >> `java.util.zip.ZipFile` which would cause failures when multiple instances >> of `ZipFile` using non-UTF8 `Charset` were operating against the same >> underlying ZIP file? This addresses >> https://bugs.openjdk.org/browse/JDK-8347712. >> >> ZIP file specification allows for ZIP entries to mark a `UTF-8` flag to >> indicate that the entry name and comment are encoded using UTF8. A >> `java.util.zip.ZipFile` can be constructed by passing it a `Charset`. This >> `Charset` (which defaults to UTF-8) gets used for decoding entry names and >> comments for non-UTF8 entries. >> >> The internal implementation of `ZipFile` uses a `ZipCoder` (backed by >> `java.nio.charset.CharsetEncoder/CharsetDecoder` instance) for the given >> `Charset`. Except for UTF8 `ZipCoder`, other `ZipCoder`s are not thread safe. >> >> The internal implementation of `ZipFile` maintains a cache of >> `ZipFile$Source`. A `Source` corresponds to the underlying ZIP file and >> during construction, uses a `ZipCoder` for parsing the ZIP entries and once >> constructed holds on to the parsed ZIP structure. Multiple instances of a >> `ZipFile` which all correspond to the same ZIP file on the filesystem, share >> a single instance of `Source` (after the `Source` has been constructed and >> cached). Although `ZipFile` instances aren't expected to be thread-safe, the >> fact that multiple different instances of `ZipFile` could be sharing the >> same instance of `Source` in concurrent threads, mandates that the `Source` >> must be thread-safe. >> >> In Java 15, we did a performance optimization through >> https://bugs.openjdk.org/browse/JDK-8243469. As part of that change, we >> started holding on to the `ZipCoder` instance (corresponding to the >> `Charset` provided during `ZipFile` construction) in the `Source`. This >> stored `ZipCoder` was then used for `ZipFile` operations when working with >> the ZIP entries. As noted previously, any non-UTF8 `ZipCoder` is not >> thread-safe and as a result, any usages of `ZipCoder` in the `Source` makes >> `Source` not thread-safe too. That effectively violates the requirement that >> `Source` must be thread-safe to allow for its usage in multiple different >> `ZipFile` instances concurrently. This then causes `ZipFile` usages to fail >> in unexpected ways like the one shown in the linked >> https://bugs.openjdk.org/browse/JDK-8347712. >> >> The commit in this PR addresses the issue by not maintaining `ZipCoder` as a >> instance field of `Source`. Instead the `ZipCoder` is now mainta... > > Jaikiran Pai has updated the pull request incrementally with one additional > commit since the last revision: > > Lance's review - update code comment in the test Great work, I have no further input. Reviewing this part of ZipFile reminds me that this codes begs for some refactoring. Neither the RandomAccessFile nor the CEN bytes change between different charsets so I believe there may be potential for some untangling and simplification here. But that’s for another PR :) ------------- Marked as reviewed by eirbjo (Committer). PR Review: https://git.openjdk.org/jdk/pull/23986#pullrequestreview-2814432670