> Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support > for following new two vector permutation APIs. > > > Declaration:- > Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the > values stored in first (v1) and second (v2) vector arguments. Thus, first and > second vector serves as a table, whose elements are selected based on index > value vector. API is applicable to all integral and floating-point types. > The result of this operation is semantically equivalent to expression > v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must > lie within valid two vector index range [0, 2*VLEN) else an > IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a > lowering transformation dismantles new IR into constituent IR supported by > target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing > rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score > Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 > ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 > ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 > ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 > ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 > ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 > ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 > ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 > ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 > ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 > ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 > ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2...
Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/6cb1a46d..408a8694 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=04-05 Stats: 112 lines in 7 files changed: 91 ins; 14 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508