Re: RFR: 8342040: Further improve entry lookup performance for multi-release JARs [v3]

Claes Redestad Fri, 18 Oct 2024 07:02:26 -0700

On Thu, 17 Oct 2024 22:32:12 GMT, Claes Redestad <[email protected]> wrote:


>>> I guess performance leans on there being only one or a small number of 
>>> versions per entry. 
>> 
>> I checked Spring Petclinic:
>> 
>> * 109 JAR dependencies
>> * 15 are multi-release enabled.
>> * The 15 MR-JARs have 9347 entries, of which 21 are versioned
>> * 9 of 15 MR JARs use the feature exclusively to hide `module-info.class` 
>> from Java 8.
>> * 3 of the 6 remaining JAR files version a single class file
>> * The remaining JARs mostly version on the 9 release.
>> * The maximum number of versioned entries in a file is 5.
>> * Zero entries have more than a single version.
>> 
>>> Which I think is a fair assumption. Still would be good to have a stress 
>>> test with many entries having many versions and make sure we don't regress 
>>> such cases too much, no?
>> 
>> Such a test would be somewhat speculative. If I add 5 versions of each entry 
>> in `ZipFileOpen`, then yes, this PR sees a regression over mainline:
>> 
>> Baseline:
>> 
>> Benchmark                     (size)  Mode  Cnt       Score       Error  
>> Units
>> ZipFileOpen.openCloseZipFile    1024  avgt   15  718234.380 ? 29454.645  
>> ns/op
>> 
>> 
>> PR:
>> 
>> Benchmark                     (size)  Mode  Cnt        Score       Error  
>> Units
>> ZipFileOpen.openCloseZipFile    1024  avgt   15  1284944.601 ? 74919.522  
>> ns/op
>> 
>> 
>> While `JarFileGetEntry` improves over baseline:
>> 
>> Baseline:
>> 
>> Benchmark                              (mr)  (size)  Mode  Cnt    Score   
>> Error  Units
>> JarFileGetEntry.getEntryHit           false    1024  avgt   15   68.984 ? 
>> 2.791  ns/op
>> JarFileGetEntry.getEntryHit            true    1024  avgt   15  324.763 ? 
>> 9.690  ns/op
>> JarFileGetEntry.getEntryHitUncached   false    1024  avgt   15   95.884 ? 
>> 1.404  ns/op
>> JarFileGetEntry.getEntryHitUncached    true    1024  avgt   15  333.491 ? 
>> 3.576  ns/op
>> JarFileGetEntry.getEntryMiss          false    1024  avgt   15   37.358 ? 
>> 2.215  ns/op
>> JarFileGetEntry.getEntryMiss           true    1024  avgt   15  650.467 ? 
>> 5.116  ns/op
>> JarFileGetEntry.getEntryMissUncached  false    1024  avgt   15   61.059 ? 
>> 0.382  ns/op
>> JarFileGetEntry.getEntryMissUncached   true    1024  avgt   15  678.457 ? 
>> 4.390  ns/op
>> 
>> 
>> PR:
>> 
>> 
>> Benchmark                              (mr)  (size)  Mode  Cnt    Score    
>> Error  Units
>> JarFileGetEntry.getEntryHit           false    1024  avgt   15   61.333 ?  
>> 3.665  ns/op
>> JarFileGetEntry.getEntryHit            true    1024  avgt   15  153.987 ?  
>> 8.889  ns/op
>> JarFileGetEntry.getEntryHitUncached   false    1024  avgt   15   90.008 ? ...
>
> I poked around with an alternative approach that uses `java.util.BitSet`. 
> This also allows us to use the map we build up as-is, avoids sorting et.c. 
> Performance on `JarFileGetEntry` (#21489 version) is the same in my testing, 
> but performance on `ZipFileOpen` with your 5 versions of each entry stress 
> test should be much improved (might beat the baseline). Can you share your 
> local changes to that micro in a gist or e-mail so I can verify?

With this change to `ZipFileOpen`: 

diff --git a/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java 
b/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java
index 4f6ae6373ec..0e22cf7f0ab 100644
--- a/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java
+++ b/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java
@@ -73,6 +73,11 @@ public void beforeRun() throws IOException {

                 ename += entryPrefixes[i % entryPrefixes.length] + "-" + i;
                 zos.putNextEntry(new ZipEntry(ename));
+                zos.putNextEntry(new ZipEntry("META-INF/versions/9/" + ename));
+                zos.putNextEntry(new ZipEntry("META-INF/versions/11/" + 
ename));
+                zos.putNextEntry(new ZipEntry("META-INF/versions/17/" + 
ename));
+                zos.putNextEntry(new ZipEntry("META-INF/versions/21/" + 
ename));
+                zos.putNextEntry(new ZipEntry("META-INF/versions/24/" + 
ename));
             }
         }
         zipFile = tempFile;


On that modified `ZipFileOpen` micro,
https://github.com/eirbjo/jdk/pull/1 improves over #21489 but still regress 
compared to the baseline:


Benchmark                       (size)  Mode  Cnt        Score       Error  
Units
ZipFileOpen.openCloseZipFile      1024  avgt   15   677386,387 ± 15761,989  
ns/op # baseline
ZipFileOpen.openCloseZipFile      1024  avgt   15  1121127,629 ± 16716,218  
ns/op # pull/21489, 0.6x
ZipFileOpen.openCloseZipFile      1024  avgt   15   968501,630 ± 28761,067  
ns/op # eirbjo/jdk#1, 0.7x


I suspect remaining regression is mostly from the added `String` 
creation/processing in `initCEN`.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21489#discussion_r1806179709

Re: RFR: 8342040: Further improve entry lookup performance for multi-release JARs [v3]

Reply via email to