> After finding a hash match, getEntryPos needs to compare the lookup name up > to the encoded entry name in the CEN. This comparison is done by decoding the > entry name into a String. The names can then be compared using the String > API. This decoding step adds a significat cost to this method. > > This PR suggest to update the string comparison such that in the common case > where both the lookup name and the entry name are encoded in > ASCII-compatible UTF-8, decoding can be avoided and the byte arrays can > instead be compared direcly. > > ZipCoder is updated with a new method to compare a string with an encoded > byte array range. The default implementation decodes to string (like the > current code), while the UTF-8 implementation uses > JavaLangAccess.getBytesNoRepl to get the bytes. Both methods thes uses > Arrays.mismatch for comparison with or without matching trailing slashes. > > Additionally, this PR suggest to make the following updates to getEntryPos: > > - The try/catch for IAE is redundant and can be safely removed. (initCEN > already checks this and will throws IAE for invalid UTF-8). This seems to > give a 3-4% speedup on micros) > - A new test InvalidBytesInEntryNameOrComment is a added to verify that > initCEN does in fact reject invalid UTF-8 in CEN file names and comments. (I > found no existing test coverage for this) > - The recursion when looking for "name/" matches is replaced with iteration. > We keep track of any "name/" match and return it at the end of the search. (I > feel this is easier to follow and it also gives a ~30% speedup for addSlash > lookups with no regression on regular lookups) > > (My though is that including these additional updates in this PR might reduce > reviewer overhead given that it touches the exact same code. I might be wrong > on this, please advise :) > > I'm seeing a ~17% saving on the micro ZipFileGetEntry.getEntryHit (modified > to use xalan.jar): > > Baseline: > > Benchmark (size) Mode Cnt Score Error > Units > ZipFileGetEntry.getEntryHit 512 avgt 15 74.941 ± 1.004 > ns/op > ZipFileGetEntry.getEntryHit 1024 avgt 15 84.943 ± 1.320 > ns/op > ZipFileGetEntry.getEntryHitUncached 512 avgt 15 120.371 ± 2.386 > ns/op > ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 126.128 ± 1.075 > ns/op > ZipFileGetEntry.getEntryMiss 512 avgt 15 23.818 ± 0.838 > ns/op > ZipFileGetEntry.getEntryMiss 1024 avgt 15 29.762 ± 5.998 > ns/op > ZipFileGetEntry.getEntryMissUncached 512 avgt 15 59.405 ± 0.545 > ns/op > ZipFileGetEntry.getEntryMissUncached 1024 avgt 15 71.840 ± 2.455 > ns/op > ZipFileGetEntry.getEntrySlash 512 avgt 15 135.621 ± 4.341 > ns/op > ZipFileGetEntry.getEntrySlash 1024 avgt 15 134.190 ± 2.141 > ns/op > > > > PR: > > > Benchmark (size) Mode Cnt Score Error > Units > ZipFileGetEntry.getEntryHit 512 avgt 15 62.267 ± 1.329 > ns/op > ZipFileGetEntry.getEntryHit 1024 avgt 15 72.916 ± 2.428 > ns/op > ZipFileGetEntry.getEntryHitUncached 512 avgt 15 101.630 ± 1.154 > ns/op > ZipFileGetEntry.getEntryHitUncached 1024 avgt 15 113.161 ± 0.502 > ns/op > ZipFileGetEntry.getEntryMiss 512 avgt 15 23.003 ± 1.191 > ns/op > ZipFileGetEntry.getEntryMiss 1024 avgt 15 23.236 ± 1.114 > ns/op > ZipFileGetEntry.getEntryMissUncached 512 avgt 15 56.781 ± 1.505 > ns/op > ZipFileGetEntry.getEntryMissUncached 1024 avgt 15 67.767 ± 1.963 > ns/op > ZipFileGetEntry.getEntrySlash 512 avgt 15 73.745 ± 2.717 > ns/op > ZipFileGetEntry.getEntrySlash 1024 avgt 15 75.784 ± 1.051 > ns/op > > > To assess the impact on startup/warmup, I made a main method class which > measures the total time of calling ZipFile.getEntry for all entries in the > 109 JAR file dependenies of spring-petclinic. The shows a nice improvement > (time in micros): > > > Percentile Baseline Patch > 50 % 23155 21149 > 75 % 23598 21454 > 90 % 23989 21691 > 95 % 24238 21973 > 99 % 25270 22446 > STDEV 792 549 > Count 500 500
Eirik Bjorsnos has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - Merge branch 'master' into getentrypos-prefixmatch - Fix incorrect offset for the CEN "length of extra field". Fixed spelling of "invalid". - Spelling fix in test data for non-ascii latin1 string - Replace new shared secret with using getBytesNoRepl in UTF8ZipCoder.compare. Add a test case for UTF-8 encoded entry name which is latin1, but not ASCII - Rename test to InvalidBytesInEntryNameOrComment - Adjust whitespace - Some minor improvements to code comments - Micros seem to indicate that the range checks in Arrays.mismatch might have as much as 5% regression - Move String/byte[] comparison into ZipCoder. Change the default implementation to decode to string for comparison instead of encoding to bytes, this seems safer. Revert some changes from previous commits to parameters in the hasTrailingSlash method. - ByteBuffers for reading ZIP files must be little-endian - ... and 10 more: https://git.openjdk.org/jdk/compare/23f44474...06047377 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12290/files - new: https://git.openjdk.org/jdk/pull/12290/files/2c5e7c2c..06047377 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12290&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12290&range=02-03 Stats: 22651 lines in 906 files changed: 9116 ins; 4377 del; 9158 mod Patch: https://git.openjdk.org/jdk/pull/12290.diff Fetch: git fetch https://git.openjdk.org/jdk pull/12290/head:pull/12290 PR: https://git.openjdk.org/jdk/pull/12290