On Thu, 5 Oct 2023 23:36:48 GMT, Srinivas Vamsi Parasa <d...@openjdk.org> wrote:

>> The goal is to develop faster sort routines for x86_64 CPUs by taking 
>> advantage of AVX512 instructions. This enhancement provides an order of 
>> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
>> 
>> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and 
>> upto ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
>> performance data below.
>> 
>> 
>> **Arrays.sort performance data using JMH benchmarks for arrays with random 
>> data** 
>> 
>> |    Arrays.sort benchmark   |       Array Size      |       Baseline 
>> (us/op)        |       AVX512 Sort (us/op)     |       Speedup |
>> |    ---     |       ---     |       ---     |       ---     |       ---     
>> |
>> |    ArraysSort.doubleSort   |       10      |       0.034   |       0.035   
>> |       1.0     |
>> |    ArraysSort.doubleSort   |       25      |       0.116   |       0.089   
>> |       1.3     |
>> |    ArraysSort.doubleSort   |       50      |       0.282   |       0.291   
>> |       1.0     |
>> |    ArraysSort.doubleSort   |       75      |       0.474   |       0.358   
>> |       1.3     |
>> |    ArraysSort.doubleSort   |       100     |       0.654   |       0.623   
>> |       1.0     |
>> |    ArraysSort.doubleSort   |       1000    |       9.274   |       6.331   
>> |       1.5     |
>> |    ArraysSort.doubleSort   |       10000   |       323.339 |       71.228  
>> |       **4.5** |
>> |    ArraysSort.doubleSort   |       100000  |       4471.871        |       
>> 1002.748        |       **4.5** |
>> |    ArraysSort.doubleSort   |       1000000 |       51660.742       |       
>> 12921.295       |       **4.0** |
>> |    ArraysSort.floatSort    |       10      |       0.045   |       0.046   
>> |       1.0     |
>> |    ArraysSort.floatSort    |       25      |       0.103   |       0.084   
>> |       1.2     |
>> |    ArraysSort.floatSort    |       50      |       0.285   |       0.33    
>> |       0.9     |
>> |    ArraysSort.floatSort    |       75      |       0.492   |       0.346   
>> |       1.4     |
>> |    ArraysSort.floatSort    |       100     |       0.597   |       0.326   
>> |       1.8     |
>> |    ArraysSort.floatSort    |       1000    |       9.811   |       5.294   
>> |       1.9     |
>> |    ArraysSort.floatSort    |       10000   |       323.955 |       50.547  
>> |       **6.4** |
>> |    ArraysSort.floatSort    |       100000  |       4326.38 |       731.152 
>> |       **5.9** |
>> |    ArraysSort.floatSort    |       1000000 |       52413.88        |       
>> 8409.193        |       **6.2** |
>> |    ArraysSort.intSort      |       10      |       0.033   |       0.033   
>> |       1.0     |
>> |    ArraysSort.intSort      |       25      |       0.086   |       0.051   
>> |       1.7     |
>> |    ArraysSort.intSort      |       50      |       0.236   |       0.151   
>> |       1.6     |
>> |    ArraysSort.intSort      |       75      |       0.416   |       0.332   
>> |       1.3     |
>> |    ArraysSort.intSort      |       100     |       0.63    |       0.521   
>> |       1.2     |
>> |    ArraysSort.intSort      |       1000    |       10.518  |       4.698   
>> |       2.2     |
>> |    ArraysSort.intSort      |       10000   |       309.659 |       42.518  
>> |       **7.3** |
>> |    ArraysSort.intSort      |       100000  |       4130.917        |       
>> 573.956 |       **7.2** |
>> |    ArraysSort.intSort      |       1000000 |       49876.307       |       
>> 6712.812        |       **7.4** |
>> |    ArraysSort.longSort     |       10      |       0.036   |       0.037   
>> |       1.0     |
>> |    ArraysSort.longSort     |       25      |       0.094   |       0.08    
>> |       1.2     |
>> |    ArraysSort.longSort     |       50      |       0.218   |       0.227   
>> |       1.0     |
>> |    ArraysSort.longSort     |       75      |       0.466   |       0.402   
>> |       1.2     |
>> |    ArraysSort.longSort     |       100     |       0.76    |       0.58    
>> |       1.3     |
>> |    ArraysSort.longSort     |       1000    |       10.449  |       6....
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due 
> to a merge or a rebase. The pull request now contains 45 commits:
> 
>  - fix code style and formatting
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into avx512sort
>  - Update CompileThresholdScaling only for the sort and partition intrinsics; 
> update build script to remove nested if
>  - change variable names of indexPivot* to pivotIndex*
>  - Update DualPivotQuicksort.java
>  - Rename arraySort and arrayPartition Java methods to sort and partition. 
> Cleanup some comments
>  - Remove the unnecessary exception in single pivot partitioning fallback 
> method
>  - Move functional interfaces close to the associated methods
>  - Refactor the sort and partition intrinsics to accept method references for 
> fallback functions
>  - Refactor stub handling to use a generic function for all types
>  - ... and 35 more: https://git.openjdk.org/jdk/compare/a1c9587c...a5262d86

> The goal is to develop faster sort routines for x86_64 CPUs by taking 
> advantage of AVX512 instructions. This enhancement provides an order of 
> magnitude speedup for Arrays.sort() using int, long, float and double arrays.
> 
> This PR shows upto ~7x improvement for 32-bit datatypes (int, float) and upto 
> ~4.5x improvement for 64-bit datatypes (long, double) as shown in the 
> performance data below.
> 
> **Arrays.sort performance data using JMH benchmarks for arrays with random 
> data**
> 
> Arrays.sort benchmark Array Size      Baseline (us/op)        AVX512 Sort 
> (us/op)     Speedup
> ArraysSort.doubleSort 10      0.034   0.035   1.0
> ArraysSort.doubleSort 25      0.116   0.089   1.3
> ArraysSort.doubleSort 50      0.282   0.291   1.0
> ArraysSort.doubleSort 75      0.474   0.358   1.3
> ArraysSort.doubleSort 100     0.654   0.623   1.0
> ArraysSort.doubleSort 1000    9.274   6.331   1.5
> ArraysSort.doubleSort 10000   323.339 71.228  **4.5**
> ArraysSort.doubleSort 100000  4471.871        1002.748        **4.5**
> ArraysSort.doubleSort 1000000 51660.742       12921.295       **4.0**
> ArraysSort.floatSort  10      0.045   0.046   1.0
> ArraysSort.floatSort  25      0.103   0.084   1.2
> ArraysSort.floatSort  50      0.285   0.33    0.9
> ArraysSort.floatSort  75      0.492   0.346   1.4
> ArraysSort.floatSort  100     0.597   0.326   1.8
> ArraysSort.floatSort  1000    9.811   5.294   1.9
> ArraysSort.floatSort  10000   323.955 50.547  **6.4**
> ArraysSort.floatSort  100000  4326.38 731.152 **5.9**
> ArraysSort.floatSort  1000000 52413.88        8409.193        **6.2**
> ArraysSort.intSort    10      0.033   0.033   1.0
> ArraysSort.intSort    25      0.086   0.051   1.7
> ArraysSort.intSort    50      0.236   0.151   1.6
> ArraysSort.intSort    75      0.416   0.332   1.3
> ArraysSort.intSort    100     0.63    0.521   1.2
> ArraysSort.intSort    1000    10.518  4.698   2.2
> ArraysSort.intSort    10000   309.659 42.518  **7.3**
> ArraysSort.intSort    100000  4130.917        573.956 **7.2**
> ArraysSort.intSort    1000000 49876.307       6712.812        **7.4**
> ArraysSort.longSort   10      0.036   0.037   1.0
> ArraysSort.longSort   25      0.094   0.08    1.2
> ArraysSort.longSort   50      0.218   0.227   1.0
> ArraysSort.longSort   75      0.466   0.402   1.2
> ArraysSort.longSort   100     0.76    0.58    1.3
> ArraysSort.longSort   1000    10.449  6.239   1.7
> ArraysSort.longSort   10000   307.074 70.284  **4.4**
> ArraysSort.longSort   100000  4135.651        1049.12 **3.9**
> ArraysSort.longSort   1000000 49559.152       12537.53        **4.0**
> This is the first in the series of PRs related to vectorized sorting. Future 
> planned steps after the integration of this PR:
> 
> 1. Enabling AVX2 sort and partitioning for Linux ( based on Intel's 
> x86-simd-sort [PR](https://github.com/intel/x86-simd-sort/pull/60)).
> 2. Enabling SIMD sort (both AVX512 and AVX2) for Short and Char arrays for 
> length < 1750. (Arrays larger than 1750 already use countingSort which is 
> faster than AVX512 sort.)
> 3. Adding Windows support for both AVX512 and AVX2 using assembly stubs.
> 4. Enabling SIMD sort (both AVX512 and AVX2) for MemorySegment.
> 
> ### Progress
> * [x]  Change must be properly reviewed (1 review required, with at least 1 
> [Reviewer](https://openjdk.org/bylaws#reviewer))
> * [x]  Change must not contain extraneous whitespace
> * [x]  Commit message must refer to an issue
> 
> ### Issue
> * [JDK-8309130](https://bugs.openjdk.org/browse/JDK-8309130): x86_64 AVX512 
> intrinsics for Arrays.sort methods (int, long, float and double arrays) 
> (**Enhancement** - P4)
> 
> ### Reviewers
> * [Jatin Bhateja](https://openjdk.org/census#jbhateja) (@jatin-bhateja - 
> **Reviewer**) ⚠️ Review applies to 
> [e63a2aa0](https://git.openjdk.org/jdk/pull/14227/files/e63a2aa081275c3f1ed2ccc4315a60f304d18b34)
> * [Sandhya Viswanathan](https://openjdk.org/census#sviswanathan) (@sviswa7 - 
> **Reviewer**) ⚠️ Review applies to 
> [9642d852](https://git.openjdk.org/jdk/pull/14227/files/9642d852cce5a9cf8270b850c124ef38fc158c6d)
> * [Paul Sandoz](https://openjdk.org/census#psandoz) (@PaulSandoz - 
> **Reviewer**) ⚠️ Review applies to 
> [b04cb6c3](https://git.openjdk.org/jdk/pull/14227/files/b04cb6c3c41c7327f9dc67653e24b08693329e3e)
> * [Vladimir Kozlov](https://openjdk.org/census#kvn) (@vnkozlov - **Reviewer**)
> 
> ### Reviewing
> Using `git`
> Using Skara CLI tools
> Using diff file
> ### Webrev
> [Link to Webrev 
> Comment](https://git.openjdk.org/jdk/pull/14227#issuecomment-1568931001)

hello, I tested this feature in openjdk 22.19, but it didn't seem to work.   
my test code is as follows:   

public class JDKSort {
    public static void main(String[] args) {
        new JDKSort().sort();
    }

    public void sort() {
        // 100 million pieces of data
        int size = 100000000;
        int[] arr = new int[size];
        java.util.Random rand = new java.util.Random();
        for (int i = 0; i < size; ++i) {
            arr[i] = rand.nextInt();
        }

        long startTime = System.currentTimeMillis();
        java.util.Arrays.sort(arr);
        long elapseTime = System.currentTimeMillis() - startTime;
        System.out.println("elapse time -> " + elapseTime + " ms");
    }
}

test result:
- jdk8:   15079 ms
- jdk22:  11619 ms

this feature looks like it's merged into 22.19:   
[x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double 
arrays)](https://bugs.openjdk.org/browse/JDK-8309130)   
but it doesn't seem to working, do I need to do anything?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1761008818

Reply via email to