Hi All,

As per the discussion on panama-dev mailing list[1], patch adds the support for 
following new two vector permutation APIs.


Declaration:-
    Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2)


Semantics:-
    Using index values stored in the lanes of "this" vector, assemble the 
values stored in first (v1) and second (v2) vector arguments. Thus, first and 
second vector serves as a table, whose elements are selected based on index 
value vector. API is applicable to all integral and floating-point types.  The 
result of this operation is semantically equivalent to expression 
v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie 
within valid two vector index range [0, 2*VLEN) else an 
IndexOutOfBoundException is thrown.  

Summary of changes:
-  Java side implementation of new selectFrom API.
-  C2 compiler IR and inline expander changes.
-  In absence of direct two vector permutation instruction in target ISA, a 
lowering transformation dismantles new IR into constituent IR supported by 
target platforms. 
-  Optimized x86 backend implementation for AVX512 and legacy target.
-  Function tests covering new API.

JMH micro included with this patch shows around 10-15x gain over existing 
rearrange API :-
Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server]


  Benchmark                                     (size)   Mode  Cnt      Score   
Error   Units
SelectFromBenchmark.rearrangeFromByteVector     1024  thrpt    2   2041.762     
     ops/ms
SelectFromBenchmark.rearrangeFromByteVector     2048  thrpt    2   1028.550     
     ops/ms
SelectFromBenchmark.rearrangeFromIntVector      1024  thrpt    2    962.605     
     ops/ms
SelectFromBenchmark.rearrangeFromIntVector      2048  thrpt    2    479.004     
     ops/ms
SelectFromBenchmark.rearrangeFromLongVector     1024  thrpt    2    359.758     
     ops/ms
SelectFromBenchmark.rearrangeFromLongVector     2048  thrpt    2    178.192     
     ops/ms
SelectFromBenchmark.rearrangeFromShortVector    1024  thrpt    2   1463.459     
     ops/ms
SelectFromBenchmark.rearrangeFromShortVector    2048  thrpt    2    727.556     
     ops/ms
SelectFromBenchmark.selectFromByteVector        1024  thrpt    2  33254.830     
     ops/ms
SelectFromBenchmark.selectFromByteVector        2048  thrpt    2  17313.174     
     ops/ms
SelectFromBenchmark.selectFromIntVector         1024  thrpt    2  10756.804     
     ops/ms
SelectFromBenchmark.selectFromIntVector         2048  thrpt    2   5398.244     
     ops/ms
SelectFromBenchmark.selectFromLongVector        1024  thrpt    2   5856.859     
     ops/ms
SelectFromBenchmark.selectFromLongVector        2048  thrpt    2   1513.378     
     ops/ms
SelectFromBenchmark.selectFromShortVector       1024  thrpt    2  17888.617     
     ops/ms
SelectFromBenchmark.selectFromShortVector       2048  thrpt    2   9079.565     
     ops/ms


Kindly review and share your feedback.

Best Regards,
Jatin

[1] https://mail.openjdk.org/pipermail/panama-dev/2024-May/020408.html

-------------

Commit messages:
 - Adding Benchmark
 - 8338023: Support two vector selectFrom API

Changes: https://git.openjdk.org/jdk/pull/20508/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8338023
  Stats: 2737 lines in 95 files changed: 2719 ins; 17 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/20508.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508

PR: https://git.openjdk.org/jdk/pull/20508

Reply via email to