On Tue, 24 Sep 2024 07:10:24 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the >> values stored in first (v1) and second (v2) vector arguments. Thus, first >> and second vector serves as a table, whose elements are selected based on >> index value vector. API is applicable to all integral and floating-point >> types. The result of this operation is semantically equivalent to >> expression v1.rearrange(this.toShuffle(), v2). Values held in index vector >> lanes must lie within valid two vector index range [0, 2*VLEN) else an >> IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a >> lowering transformation dismantles new IR into constituent IR supported by >> target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing >> rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt >> Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 >> ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 >> ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 >> ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 >> ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 >> ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 >> ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 >> ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 >> ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 >> ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 >> ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 >> ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w > 128 and 2048 bits at 128 bit increments. src/hotspot/share/opto/vectorIntrinsics.cpp line 2689: > 2687: !arch_supports_vector(cast_vopc, num_elem, T_BYTE, > VecMaskNotUsed) || > 2688: !arch_supports_vector(Op_VectorLoadShuffle, num_elem, > index_elem_bt, VecMaskNotUsed) || > 2689: !arch_supports_vector(Op_Replicate, num_elem, T_BYTE, > VecMaskNotUsed)) { Where SelectFromTwoVector is not supported, the alternate implementation is as part of SelectFromTwoVectorNode::Ideal() instead of right here. A comment both here as well as in the Ideal() implementation is needed to keep these checks in sync. src/hotspot/share/opto/vectornode.cpp line 2120: > 2118: // are held in a byte vector which are later transformed to target > specific permutation > 2119: // index format by subsequent VectorLoadShuffle. > 2120: int cast_vopc = VectorCastNode::opcode(0, index_elem_bt, true); Good to use -1 when we are not sending an actual opcode: int cast_vopc = VectorCastNode::opcode(-1, index_elem_bt, true); src/hotspot/share/opto/vectornode.cpp line 2126: > 2124: Node* bcast_lane_cnt_m1_vec = > phase->transform(VectorNode::scalar2vector(lane_cnt_m1, num_elem, > Type::get_const_basic_type(T_BYTE), false)); > 2125: > 2126: // Compute the blend mask for merging two indipendently permututed > vectors Typo indipendently -> independently ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781867326 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781873682 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1781888912