On Fri, 26 Jan 2024 17:19:25 GMT, Srinivas Vamsi Parasa <d...@openjdk.org> wrote:
>> Hello Vamsi (@vamsi-parasa), >> >> Could you please run the benchmarking of new DQPS in your environment with >> AVX? >> >> Take all classes below and put them in the package >> org.openjdk.bench.java.util. >> ArraysSort class contains all tests for the new versions and ready to use. >> (it will run all tests in one execution). >> >> https://github.com/iaroslavski/sorting/blob/master/radixsort/ArraysSort.java >> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java >> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20s.java >> https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_r20p.java >> >> Many thanks, >> Vladimir > > Hi Vladimir (@iaroslavski), > > Please see the JMH data below. > > Thanks, > Vamsi > > Benchmark (builder) (size) Mode Cnt Score Error > Units > ArraysSort.Int.a15 RANDOM 600 avgt 4 7.096 ± 0.081 > us/op > ArraysSort.Int.a15 RANDOM 2000 avgt 4 44.014 ± 1.717 > us/op > ArraysSort.Int.a15 RANDOM 90000 avgt 4 4451.444 ± 71.168 > us/op > ArraysSort.Int.a15 RANDOM 400000 avgt 4 22751.966 ± 683.597 > us/op > ArraysSort.Int.a15 RANDOM 3000000 avgt 4 190326.306 ± 8008.512 > us/op > ArraysSort.Int.a15 REPEATED 600 avgt 4 1.044 ± 0.016 > us/op > ArraysSort.Int.a15 REPEATED 2000 avgt 4 2.272 ± 0.287 > us/op > ArraysSort.Int.a15 REPEATED 90000 avgt 4 412.331 ± 11.656 > us/op > ArraysSort.Int.a15 REPEATED 400000 avgt 4 1908.978 ± 30.241 > us/op > ArraysSort.Int.a15 REPEATED 3000000 avgt 4 15163.443 ± 100.425 > us/op > ArraysSort.Int.a15 STAGGER 600 avgt 4 1.055 ± 0.057 > us/op > ArraysSort.Int.a15 STAGGER 2000 avgt 4 3.408 ± 0.096 > us/op > ArraysSort.Int.a15 STAGGER 90000 avgt 4 149.220 ± 4.022 > us/op > ArraysSort.Int.a15 STAGGER 400000 avgt 4 663.096 ± 30.252 > us/op > ArraysSort.Int.a15 STAGGER 3000000 avgt 4 5206.890 ± 234.857 > us/op > ArraysSort.Int.a15 SHUFFLE 600 avgt 4 4.611 ± 0.118 > us/op > ArraysSort.Int.a15 SHUFFLE 2000 avgt 4 17.955 ± 0.356 > us/op > ArraysSort.Int.a15 SHUFFLE 90000 avgt 4 1410.357 ± 41.128 > us/op > ArraysSort.Int.a15 SHUFFLE 400000 avgt 4 5739.311 ± 128.270 > us/op > ArraysSort.Int.a15 SHUFFLE 3000000 avgt 4 41501.980 ± 829.443 > us/op > ArraysSort.Int.jdk RANDOM 600 avgt 4 1.612 ± 0.088 > us/op > ArraysSort.Int.jdk RANDOM 2000 avgt 4 6.893 ± 0.375 > us/op > ArraysSort.Int.jdk RANDOM 90000 avgt 4 522.749 ± 19.386 > us/op > ArraysSort.Int.jdk RANDOM 400000 avgt 4 2424.204 ± 63.844 > us/op > ArraysSort.Int.jdk RANDOM 3000000 avgt 4 21000.434 ± 801.315 > us/op > ArraysSort.Int.jdk REPEATED 600 avgt 4 0.496 ± 0.030 > us/op > ArraysSort.Int.jdk REPEATED 2000 avgt 4 1.037 ± 0.083 > us/op > ArraysSort.Int.jdk REPEATED 90000 avgt 4 57.763 ± 1.955 > us/op > ArraysSort.Int.jd... Hi Vamsi (@vamsi-parasa), Laurent(@bourgesl), The latest benchmarking compares compares the following versions: jdk - direct call of Arrays.sort(); a15 - the current source of DualPivotQuicksort from the latest build (except renaming) https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/DualPivotQuicksort.java https://github.com/iaroslavski/sorting/blob/master/radixsort/DualPivotQuicksort_a15.java r20s - new version without Radix sort r20p - new version with Radix sort in parallel case only It is expected that timing of jdk and a15 should be more or less the same, but please look at the results: Benchmark | Data Type | Array Size | Arrays.sort() from jdk | Current source (a15) -- | -- | -- | -- | -- ArraysSort.Int.testSort | RANDOM | 600 | 1.612 | 7.096 ArraysSort.Int.testSort | RANDOM | 2000 | 6.893 | 44.014 ArraysSort.Int.testSort | RANDOM | 90000 | 522.749 | 4451.444 ArraysSort.Int.testSort | RANDOM | 400000 | 2424.204 | 22751.966 ArraysSort.Int.testSort | RANDOM | 3000000 | 21000.434 | 190326.306 ArraysSort.Int.testSort | REPEATED | 600 | 0.496 | 1.044 ArraysSort.Int.testSort | REPEATED | 2000 | 1.037 | 2.272 ArraysSort.Int.testSort | REPEATED | 90000 | 57.763 | 412.331 ArraysSort.Int.testSort | REPEATED | 400000 | 182.238 | 1908.978 ArraysSort.Int.testSort | REPEATED | 3000000 | 1708.082 | 15163.443 ArraysSort.Int.testSort | STAGGER | 600 | 1.038 | 1.055 ArraysSort.Int.testSort | STAGGER | 2000 | 3.434 | 3.408 ArraysSort.Int.testSort | STAGGER | 90000 | 148.638 | 149.220 ArraysSort.Int.testSort | STAGGER | 400000 | 663.076 | 663.096 ArraysSort.Int.testSort | STAGGER | 3000000 | 5212.821 | 5206.890 ArraysSort.Int.testSort | SHUFFLE | 600 | 1.926 | 4.611 ArraysSort.Int.testSort | SHUFFLE | 2000 | 6.858 | 17.955 ArraysSort.Int.testSort | SHUFFLE | 90000 | 473.441 | 1410.357 ArraysSort.Int.testSort | SHUFFLE | 400000 | 2153.779 | 5739.311 ArraysSort.Int.testSort | SHUFFLE | 3000000 | 18180.141 | 41501.980 You can see that a15 (current source) works extremly slower than Arrays.sort(), but the code is the same with minor differences: renaming and repackaging (I put the class to the test package org.openjdk.bench.java.util). How does other package org.openjdk.bench.java.util effect so much? I use this pom.xml file https://github.com/iaroslavski/sorting/blob/master/radixsort/pom.xml and run as following script: `mvn clean package` `java --add-exports=java.base/jdk.internal.vm.annotation=ALL-UNNAMED --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED -jar target/benchmarks.jar org.openjdk.bench.java.util.ArraysSort` It looks like intrinsics have not been applied. Check STAGGER inputs: you can see the same results because STAGGER switches to merging sort wich doesn't contain intrinsics, but other inputs have different behaviour. Therefore, I compare a15 (current sources) and the new version (r20p). we can see the new version faster (only one anomaly on REPEATED, 2000). Benchmark | Data Type | Array Size | Current source (a15) | New version (r20p) | Speedup -- | -- | -- | -- | -- | -- ArraysSort.Int.testSort | RANDOM | 600 | 7.096 | 4.343 | 1.63 ArraysSort.Int.testSort | RANDOM | 2000 | 44.014 | 20.323 | 2.17 ArraysSort.Int.testSort | RANDOM | 90000 | 4451.444 | 4073.793 | 1.09 ArraysSort.Int.testSort | RANDOM | 400000 | 22751.966 | 19768.316 | 1.15 ArraysSort.Int.testSort | RANDOM | 3000000 | 190326.306 | 165296.770 | 1.15 ArraysSort.Int.testSort | REPEATED | 600 | 1.044 | 1.048 | 1.00 ArraysSort.Int.testSort | REPEATED | 2000 | 2.272 | 3.212 | 0.71 ArraysSort.Int.testSort | REPEATED | 90000 | 412.331 | 410.775 | 1.00 ArraysSort.Int.testSort | REPEATED | 400000 | 1908.978 | 1869.529 | 1.02 ArraysSort.Int.testSort | REPEATED | 3000000 | 15163.443 | 14302.261 | 1.06 ArraysSort.Int.testSort | STAGGER | 600 | 1.055 | 0.808 | 1.31 ArraysSort.Int.testSort | STAGGER | 2000 | 3.408 | 2.683 | 1.27 ArraysSort.Int.testSort | STAGGER | 90000 | 149.220 | 121.519 | 1.23 ArraysSort.Int.testSort | STAGGER | 400000 | 663.096 | 548.277 | 1.21 ArraysSort.Int.testSort | STAGGER | 3000000 | 5206.890 | 4496.204 | 1.16 ArraysSort.Int.testSort | SHUFFLE | 600 | 4.611 | 3.383 | 1.36 ArraysSort.Int.testSort | SHUFFLE | 2000 | 17.955 | 11.400 | 1.57 ArraysSort.Int.testSort | SHUFFLE | 90000 | 1410.357 | 1001.561 | 1.41 ArraysSort.Int.testSort | SHUFFLE | 400000 | 5739.311 | 4667.466 | 1.23 ArraysSort.Int.testSort | SHUFFLE | 3000000 | 41501.980 | 36284.178 | 1.14 **Parallel sort** Benchmark | Data Type | Array Size | Current source (a15) | New version (r20p) | Speedup -- | -- | -- | -- | -- | -- ArraysSort.Int.testParallelSort | RANDOM | 600 | 7.102 | 4.315 | 1.65 ArraysSort.Int.testParallelSort | RANDOM | 2000 | 43.708 | 20.322 | 2.15 ArraysSort.Int.testParallelSort | RANDOM | 90000 | 743.725 | 409.892 | 1.81 ArraysSort.Int.testParallelSort | RANDOM | 400000 | 3099.628 | 1322.129 | 2.34 ArraysSort.Int.testParallelSort | RANDOM | 3000000 | 25830.847 | 9790.138 | 2.64 ArraysSort.Int.testParallelSort | REPEATED | 600 | 1.042 | 1.033 | 1.01 ArraysSort.Int.testParallelSort | REPEATED | 2000 | 2.273 | 3.247 | 0.70 ArraysSort.Int.testParallelSort | REPEATED | 90000 | 204.887 | 184.565 | 1.11 ArraysSort.Int.testParallelSort | REPEATED | 400000 | 758.837 | 745.817 | 1.02 ArraysSort.Int.testParallelSort | REPEATED | 3000000 | 7635.330 | 5881.615 | 1.30 ArraysSort.Int.testParallelSort | STAGGER | 600 | 1.045 | 0.807 | 1.29 ArraysSort.Int.testParallelSort | STAGGER | 2000 | 3.410 | 2.733 | 1.25 ArraysSort.Int.testParallelSort | STAGGER | 90000 | 88.093 | 73.869 | 1.19 ArraysSort.Int.testParallelSort | STAGGER | 400000 | 218.848 | 208.361 | 1.05 ArraysSort.Int.testParallelSort | STAGGER | 3000000 | 2245.299 | 2122.314 | 1.06 ArraysSort.Int.testParallelSort | SHUFFLE | 600 | 4.594 | 3.386 | 1.36 ArraysSort.Int.testParallelSort | SHUFFLE | 2000 | 17.759 | 11.570 | 1.53 ArraysSort.Int.testParallelSort | SHUFFLE | 90000 | 293.962 | 146.273 | 2.01 ArraysSort.Int.testParallelSort | SHUFFLE | 400000 | 1017.350 | 766.287 | 1.33 ArraysSort.Int.testParallelSort | SHUFFLE | 3000000 | 7419.434 | 5917.997 | 1.25 Vamsi (@vamsi-parasa), Could you please check your pom.xml and script to be sure that we use the same environment? Are intrinsics not applied in package org.openjdk.bench.java.util? Can you rerun benchmarking when classes a15, r20s and r20p are under package java.util? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1913741195