This PR proposes to rewrite the `StringSupport::chunkedStrlen*`  methods and 
move them to the class `SegmentBulkOperations` where other bulk operations 
reside.

This PR fixes a bug in the `short_strlen` variant for offsets that were odd 
(`offset % 2 != 0`).

This PR also improves performance on modern hardware, as there is no need for 
pre-looping alignment. Removing this improves performance by about 30% for 
larger strings.

Passes the `jdk_foreign` test suit.

Base:


Benchmark                               (size)  Mode  Cnt    Score   Error  
Units
InternalStrLen.changedElementQuad            1  avgt   30    2.057 ? 0.012  
ns/op
InternalStrLen.changedElementQuad            4  avgt   30    3.776 ? 0.031  
ns/op
InternalStrLen.changedElementQuad           16  avgt   30    6.690 ? 0.060  
ns/op
InternalStrLen.changedElementQuad          251  avgt   30   48.581 ? 0.764  
ns/op
InternalStrLen.changedElementQuad         1024  avgt   30  196.188 ? 3.484  
ns/op
InternalStrLen.chunkedDouble                 1  avgt   30    1.903 ? 0.013  
ns/op
InternalStrLen.chunkedDouble                 4  avgt   30    3.446 ? 0.025  
ns/op
InternalStrLen.chunkedDouble                16  avgt   30    5.759 ? 0.062  
ns/op
InternalStrLen.chunkedDouble               251  avgt   30   26.892 ? 0.141  
ns/op
InternalStrLen.chunkedDouble              1024  avgt   30   72.940 ? 1.562  
ns/op
InternalStrLen.chunkedSingle                 1  avgt   30    1.897 ? 0.015  
ns/op
InternalStrLen.chunkedSingle                 4  avgt   30    5.357 ? 0.560  
ns/op
InternalStrLen.chunkedSingle                16  avgt   30    3.821 ? 0.052  
ns/op
InternalStrLen.chunkedSingle               251  avgt   30   19.482 ? 0.190  
ns/op
InternalStrLen.chunkedSingle              1024  avgt   30   38.938 ? 0.411  
ns/op
InternalStrLen.chunkedSingleMisaligned       1  avgt   30    2.230 ? 0.147  
ns/op
InternalStrLen.chunkedSingleMisaligned       4  avgt   30    5.424 ? 0.688  
ns/op
InternalStrLen.chunkedSingleMisaligned      16  avgt   30    9.573 ? 0.063  
ns/op
InternalStrLen.chunkedSingleMisaligned     251  avgt   30   22.242 ? 0.182  
ns/op
InternalStrLen.chunkedSingleMisaligned    1024  avgt   30   45.442 ? 0.252  
ns/op
InternalStrLen.elementByteMisaligned         1  avgt   30    1.616 ? 0.041  
ns/op
InternalStrLen.elementByteMisaligned         4  avgt   30    2.982 ? 0.018  
ns/op
InternalStrLen.elementByteMisaligned        16  avgt   30    8.662 ? 0.085  
ns/op
InternalStrLen.elementByteMisaligned       251  avgt   30  126.644 ? 0.902  
ns/op
InternalStrLen.elementByteMisaligned      1024  avgt   30  492.736 ? 3.254  
ns/op
InternalStrLen.elementDouble                 1  avgt   30    1.900 ? 0.016  
ns/op
InternalStrLen.elementDouble                 4  avgt   30    3.931 ? 0.027  
ns/op
InternalStrLen.elementDouble                16  avgt   30   12.310 ? 0.109  
ns/op
InternalStrLen.elementDouble               251  avgt   30  203.665 ? 6.778  
ns/op
InternalStrLen.elementDouble              1024  avgt   30  786.320 ? 5.104  
ns/op
InternalStrLen.elementQuad                   1  avgt   30    1.922 ? 0.031  
ns/op
InternalStrLen.elementQuad                   4  avgt   30    4.078 ? 0.175  
ns/op
InternalStrLen.elementQuad                  16  avgt   30   12.538 ? 0.330  
ns/op
InternalStrLen.elementQuad                 251  avgt   30  202.175 ? 3.537  
ns/op
InternalStrLen.elementQuad                1024  avgt   30  798.846 ? 7.323  
ns/op
InternalStrLen.elementSingle                 1  avgt   30    1.614 ? 0.045  
ns/op
InternalStrLen.elementSingle                 4  avgt   30    2.992 ? 0.010  
ns/op
InternalStrLen.elementSingle                16  avgt   30    8.773 ? 0.095  
ns/op
InternalStrLen.elementSingle               251  avgt   30  126.975 ? 1.201  
ns/op
InternalStrLen.elementSingle              1024  avgt   30  499.057 ? 6.561  
ns/op


Patch:


Benchmark                               (size)  Mode  Cnt    Score    Error  
Units
InternalStrLen.changedElementQuad            1  avgt   30    1.386 ?  0.026  
ns/op
InternalStrLen.changedElementQuad            4  avgt   30    3.467 ?  0.026  
ns/op
InternalStrLen.changedElementQuad           16  avgt   30    5.551 ?  0.118  
ns/op
InternalStrLen.changedElementQuad          251  avgt   30   37.737 ?  0.197  
ns/op
InternalStrLen.changedElementQuad         1024  avgt   30  180.656 ?  3.920  
ns/op
InternalStrLen.chunkedDouble                 1  avgt   30    2.211 ?  0.022  
ns/op
InternalStrLen.chunkedDouble                 4  avgt   30    2.517 ?  0.024  
ns/op
InternalStrLen.chunkedDouble                16  avgt   30    3.763 ?  0.024  
ns/op
InternalStrLen.chunkedDouble               251  avgt   30   22.641 ?  0.790  
ns/op
InternalStrLen.chunkedDouble              1024  avgt   30   67.525 ?  1.468  
ns/op
InternalStrLen.chunkedSingle                 1  avgt   30    2.360 ?  0.187  
ns/op
InternalStrLen.chunkedSingle                 4  avgt   30    2.846 ?  0.035  
ns/op
InternalStrLen.chunkedSingle                16  avgt   30    3.450 ?  0.014  
ns/op
InternalStrLen.chunkedSingle               251  avgt   30   12.478 ?  0.055  
ns/op
InternalStrLen.chunkedSingle              1024  avgt   30   34.528 ?  0.338  
ns/op
InternalStrLen.chunkedSingleMisaligned       1  avgt   30    2.505 ?  0.013  
ns/op
InternalStrLen.chunkedSingleMisaligned       4  avgt   30    3.305 ?  0.063  
ns/op
InternalStrLen.chunkedSingleMisaligned      16  avgt   30    3.962 ?  0.083  
ns/op
InternalStrLen.chunkedSingleMisaligned     251  avgt   30   12.971 ?  0.396  
ns/op
InternalStrLen.chunkedSingleMisaligned    1024  avgt   30   35.891 ?  1.048  
ns/op
InternalStrLen.elementByteMisaligned         1  avgt   30    1.495 ?  0.042  
ns/op
InternalStrLen.elementByteMisaligned         4  avgt   30    2.898 ?  0.079  
ns/op
InternalStrLen.elementByteMisaligned        16  avgt   30    8.632 ?  0.212  
ns/op
InternalStrLen.elementByteMisaligned       251  avgt   30  128.452 ?  3.210  
ns/op
InternalStrLen.elementByteMisaligned      1024  avgt   30  508.724 ? 18.041  
ns/op
InternalStrLen.elementDouble                 1  avgt   30    1.852 ?  0.092  
ns/op
InternalStrLen.elementDouble                 4  avgt   30    3.838 ?  0.099  
ns/op
InternalStrLen.elementDouble                16  avgt   30   12.361 ?  0.327  
ns/op
InternalStrLen.elementDouble               251  avgt   30  206.742 ? 10.447  
ns/op
InternalStrLen.elementDouble              1024  avgt   30  793.779 ?  7.499  
ns/op
InternalStrLen.elementQuad                   1  avgt   30    1.790 ?  0.056  
ns/op
InternalStrLen.elementQuad                   4  avgt   30    3.732 ?  0.009  
ns/op
InternalStrLen.elementQuad                  16  avgt   30   12.067 ?  0.250  
ns/op
InternalStrLen.elementQuad                 251  avgt   30  196.458 ?  2.688  
ns/op
InternalStrLen.elementQuad                1024  avgt   30  811.230 ? 27.569  
ns/op
InternalStrLen.elementSingle                 1  avgt   30    1.465 ?  0.036  
ns/op
InternalStrLen.elementSingle                 4  avgt   30    2.875 ?  0.068  
ns/op
InternalStrLen.elementSingle                16  avgt   30    8.568 ?  0.176  
ns/op
InternalStrLen.elementSingle               251  avgt   30  126.605 ?  2.811  
ns/op
InternalStrLen.elementSingle              1024  avgt   30  494.017 ? 10.106  
ns/op

-------------

Commit messages:
 - Clean up
 - Add int method
 - Fix benchmark
 - Move and rewrite strlen methods

Changes: https://git.openjdk.org/jdk/pull/22451/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22451&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8345120
  Stats: 462 lines in 5 files changed: 212 ins; 183 del; 67 mod
  Patch: https://git.openjdk.org/jdk/pull/22451.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/22451/head:pull/22451

PR: https://git.openjdk.org/jdk/pull/22451

Reply via email to