On Fri, 26 Sep 2025 22:39:29 GMT, Vladimir Yaroslavskiy <[email protected]> wrote:
>> The goal of the PR is to improve both, sequential and parallel, sorting of >> primitives. >> >> **The main achievements** >> >> - introduced Radix in parallel sorting which shows several times boost of >> performance and has linear complexity instead of n*ln(n) >> - improved mixed insertion sort (makes whole sorting faster) >> - improved merging sort for almost sorted data >> - optimized parallel sorting >> - improved step for pivot candidates and pivot partitioning >> - suggested better buffer allocation: if no memory, it is switched to >> in-place sorting with no OutOfMemoryError >> - minor javadoc and comment changes >> >> - extended existing tests >> - added tests for radix sort, heap sort, insertion sort >> - added benchmarking JMH tests >> - improved test coverage >> >> **The summary of benchmarking:** >> >> **Sequential sorting (Arrays.sort)** >> >> byte: up to 50% faster >> char: 4-7 times faster >> short: 2-6 times faster >> int: 1.2-5 times faster >> long: 1.2-5 times faster >> float: 1.2-5 times faster >> double: 1.2-4 times faster >> >> **Parallel sorting (Arrays.parallelSort)** >> >> int: 1.2-9 times faster >> long: 1.2-9 times faster >> float: 1.2-4 times faster >> double: 1.2-4 times faster >> >> **AVX512 support** >> >> Vamsi Parasa suggested faster sort routines by taking advantage of AVX512 >> instructions, see https://github.com/openjdk/jdk/pull/14227, sources of >> sorting were modified. Therefore, I performed benchmarking of the final >> version (which includes optimizations by Vamsi Parasa and optimizations from >> my side) on a server with CPUs supported AVX512 instructions, no regression >> of performance was found, see detailed benchmarking results. >> >> **High-level chart showing how the actual sorting algorithm is selected >> based on parallel/sequential and input size** >> >> **int/long/float/double** >> >> if size < MAX_INSERTION_SORT_SIZE(=51) => [ mixed ] insertion sort >> if array is almost sorted and size > MIN_MERGING_SORT_SIZE(=512) => merging >> sort >> if recursion depth > MAX_RECURSION_DEPTH(=64) => heap sort >> if random data => two pivots quicksort, else => one pivot quicksort >> >> **byte** >> >> if size < MAX_INSERTION_SORT_SIZE(=51) => insertion sort >> else => counting sort >> >> **char/short** >> >> if size > MIN_COUNTING_SORT_SIZE(640) => counting sort >> if size < MAX_INSERTION_SORT_SIZE(=51) => insertion sort >> if recursion depth > MAX_RECURSION_DEPTH(=64) => counting sort >> if random data => two pivots quicksort, else => one pivot quicksort >> >> **parallel sorting (int/lo... > > Vladimir Yaroslavskiy has updated the pull request incrementally with one > additional commit since the last revision: > > JDK-8266431: Dual-Pivot Quicksort improvements > > * Moved Radix sort out from sorting I moved Radix sort out from sources and re-run benchmarking of parallel sorting. We can see that new parallel sorting is still faster than existing version in JDK, no degradation. **Parallel sorting (without Radix sort)** Benchmark | Data Type | Array Size | Baseline (us/op) | New version (us/op) | Speedup -- | -- | -- | -- | -- | -- int | RANDOM | 600 | 12,852 | 8,507 | **1.51** int | RANDOM | 3000 | 168,764 | 152,866 | **1.10** int | RANDOM | 40000 | 987,272 | 647,195 | **1.53** int | RANDOM | 800000 | 19120,539 | 13251,660 | **1.44** int | RANDOM | 5000000 | 133598,983 | 90992,272 | **1.47** int | SHUFFLE | 600 | 7,165 | 5,136 | **1.40** int | SHUFFLE | 3000 | 56,363 | 39,100 | **1.44** int | SHUFFLE | 40000 | 454,965 | 233,097 | **1.95** int | SHUFFLE | 800000 | 7087,698 | 3153,941 | **2.25** int | SHUFFLE | 5000000 | 32361,302 | 22618,240 | **1.43** int | STAGGER | 600 | 12,499 | 2,351 | **5.32** int | STAGGER | 3000 | 18,045 | 11,093 | **1.63** int | STAGGER | 40000 | 177,705 | 91,101 | **1.95** int | STAGGER | 800000 | 3735,779 | 1078,546 | **3.46** int | STAGGER | 5000000 | 25432,573 | 9705,696 | **2.62** int | REPEATED | 600 | 1,835 | 1,696 | **1.08** int | REPEATED | 3000 | 19,594 | 16,612 | **1.18** int | REPEATED | 40000 | 433,101 | 181,605 | **2.38** int | REPEATED | 800000 | 8680,406 | 2206,406 | **3.93** int | REPEATED | 5000000 | 46452,021 | 16940,200 | **2.74** -- | -- | -- | -- | -- | -- long | RANDOM | 600 | 12,967 | 8,352 | **1.55** long | RANDOM | 3000 | 170,926 | 150,999 | **1.13** long | RANDOM | 40000 | 1048,627 | 666,233 | **1.57** long | RANDOM | 800000 | 19057,260 | 13713,901 | **1.39** long | RANDOM | 5000000 | 140089,309 | 93820,320 | **1.49** long | SHUFFLE | 600 | 7,284 | 5,401 | **1.35** long | SHUFFLE | 3000 | 54,013 | 39,516 | **1.37** long | SHUFFLE | 40000 | 454,804 | 255,803 | **1.78** long | SHUFFLE | 800000 | 7222,403 | 3670,639 | **1.97** long | SHUFFLE | 5000000 | 33815,535 | 31767,738 | **1.06** long | STAGGER | 600 | 12,821 | 2,311 | **5.55** long | STAGGER | 3000 | 19,027 | 11,073 | **1.72** long | STAGGER | 40000 | 188,118 | 111,978 | **1.68** long | STAGGER | 800000 | 4763,922 | 1630,149 | **2.92** long | STAGGER | 5000000 | 32556,841 | 19882,107 | **1.64** long | REPEATED | 600 | 1,961 | 1,770 | **1.11** long | REPEATED | 3000 | 19,678 | 9,766 | **2.01** long | REPEATED | 40000 | 439,893 | 201,826 | **2.18** long | REPEATED | 800000 | 8725,427 | 2817,883 | **3.10** long | REPEATED | 5000000 | 47688,415 | 26917,131 | **1.77** -- | -- | -- | -- | -- | -- float | RANDOM | 600 | 13,048 | 10,256 | **1.27** float | RANDOM | 3000 | 175,047 | 167,967 | **1.04** float | RANDOM | 40000 | 1089,859 | 740,659 | **1.47** float | RANDOM | 800000 | 19919,657 | 15086,025 | **1.32** float | RANDOM | 5000000 | 143704,590 | 105425,171 | **1.36** float | SHUFFLE | 600 | 7,526 | 6,214 | **1.21** float | SHUFFLE | 3000 | 54,785 | 49,150 | **1.11** float | SHUFFLE | 40000 | 475,588 | 289,907 | **1.64** float | SHUFFLE | 800000 | 7221,754 | 4124,313 | **1.75** float | SHUFFLE | 5000000 | 35369,094 | 28281,396 | **1.25** float | STAGGER | 600 | 13,956 | 2,936 | **4.75** float | STAGGER | 3000 | 23,035 | 13,192 | **1.75** float | STAGGER | 40000 | 230,380 | 133,386 | **1.73** float | STAGGER | 800000 | 4614,573 | 1736,213 | **2.66** float | STAGGER | 5000000 | 34053,504 | 14692,088 | **2.32** float | REPEATED | 600 | 2,910 | 2,554 | **1.14** float | REPEATED | 3000 | 40,965 | 16,641 | **2.46** float | REPEATED | 40000 | 654,329 | 237,761 | **2.75** float | REPEATED | 800000 | 13094,558 | 3234,352 | **4.05** float | REPEATED | 5000000 | 71522,839 | 23208,431 | **3.08** -- | -- | -- | -- | -- | -- double | RANDOM | 600 | 13,103 | 9,628 | **1.36** double | RANDOM | 3000 | 174,936 | 168,750 | **1.04** double | RANDOM | 40000 | 1094,699 | 780,827 | **1.40** double | RANDOM | 800000 | 20205,605 | 15304,877 | **1.32** double | RANDOM | 5000000 | 147620,455 | 107560,209 | **1.37** double | SHUFFLE | 600 | 7,359 | 6,191 | **1.19** double | SHUFFLE | 3000 | 55,208 | 48,141 | **1.15** double | SHUFFLE | 40000 | 485,528 | 311,081 | **1.56** double | SHUFFLE | 800000 | 7525,265 | 4723,622 | **1.59** double | SHUFFLE | 5000000 | 38030,224 | 38159,739 | **1.00** double | STAGGER | 600 | 14,019 | 3,120 | **4.49** double | STAGGER | 3000 | 24,778 | 14,510 | **1.71** double | STAGGER | 40000 | 241,340 | 151,728 | **1.59** double | STAGGER | 800000 | 5367,318 | 2353,046 | **2.28** double | STAGGER | 5000000 | 38446,590 | 25241,014 | **1.52** double | REPEATED | 600 | 2,986 | 2,586 | **1.15** double | REPEATED | 3000 | 32,133 | 16,018 | **2.01** double | REPEATED | 40000 | 659,864 | 256,873 | **2.57** double | REPEATED | 800000 | 13045,124 | 3823,682 | **3.41** double | REPEATED | 5000000 | 75177,194 | 33156,316 | **2.27** ------------- PR Comment: https://git.openjdk.org/jdk/pull/27411#issuecomment-3340719768
