On Fri, 18 Oct 2024 16:02:26 GMT, Eirik Bjørsnøs <eir...@openjdk.org> wrote:

>> src/java.base/share/classes/java/util/zip/ZipFile.java line 1796:
>> 
>>> 1794:                                 if (metaVersions == null)
>>> 1795:                                     metaVersions = new HashMap<>();
>>> 1796:                                 
>>> metaVersions.computeIfAbsent(hashCode, _ -> new BitSet()).set(version);
>> 
>> Does this hashCode ever collide? Tried reading ZipCoder and can't rule that 
>> out. If yes, we need a fallback mechanism if this `computeIfAbsent` 
>> encountered an existing entry.
>
> Yes it may collide. This is an overapproximation of the version set where a 
> hash collision just results in an extra lookup being performed for a version 
> entry which does not exist for that name. See the comment on the 
> `metaVersions` field declaration.
> 
> This is a trade-off against the initial alternative proposed in this PR which 
> was to map using the entry name string. That would minimize lookups, but it 
> would also require allocating the String name and storing those strings in 
> the map. So we decided a few extra lookups on hash collisions was an okay 
> trade-off for faster parsing and less memory used.   
> 
> In fact, if we a use constant hash code like '1', such that _all_ hash codes 
> collide, we get the behavior of the current mainline, where a search for any 
> name will result in searches being performed for each version discovered in 
> the ZIP.

In principle the `metaVersions` map acts as a filter to reduce entry lookups, 
so it's ok to give false positives but not false negatives. 

Other alternatives such as mapping from entry length to a `BitSet` would also 
be semantically fine, would carry less of a footprint and 
per-versioned-entry-`initCEN` overhead, but might degrade to more false 
positives as the number of versioned entries in a jar file grows.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21489#discussion_r1812437330

Reply via email to