On Thu, 29 Aug 2024 05:42:58 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support >> for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the >> values stored in first (v1) and second (v2) vector arguments. Thus, first >> and second vector serves as a table, whose elements are selected based on >> index value vector. API is applicable to all integral and floating-point >> types. The result of this operation is semantically equivalent to >> expression v1.rearrange(this.toShuffle(), v2). Values held in index vector >> lanes must lie within valid two vector index range [0, 2*VLEN) else an >> IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a >> lowering transformation dismantles new IR into constituent IR supported by >> target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing >> rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt >> Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 >> ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 >> ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 >> ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 >> ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 >> ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 >> ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 >> ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 >> ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 >> ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 >> ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 >> ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Adding descriptive comments I left a few comments, hopefully I can spend some more time on this next week src/hotspot/cpu/x86/matcher_x86.hpp line 215: > 213: } > 214: > 215: static bool vector_indexes_needs_massaging(BasicType ety, int vlen) { The name "massaging" sounds quite vague. Can we have something more expressive / descriptive? Is it the vector that "needs" massaging or the indices that "need" massaging? Why `ety` and not `bt`? Is that not the name we use most often? src/hotspot/cpu/x86/x86.ad line 10490: > 10488: > 10489: > 10490: instruct selectFromTwoVec_evex(vec dst, vec src1, vec src2) You could rename `dst` -> `mask_and_dst`. That would maybe help the reader to more quickly know that it is an input-mask and output-dst. src/hotspot/share/opto/vectorIntrinsics.cpp line 2716: > 2714: C->set_max_vector_size(MAX2(C->max_vector_size(), (uint)(num_elem * > type2aelembytes(elem_bt)))); > 2715: return true; > 2716: } The code in these methods are extremely duplicated. Unboxing and boxing in every method around here. Maybe not your problem in this PR. BTW: your error logging used `v1` in all 3 cases `op1-3`, you probably want to give them useful names. `v1-3` probably? All this copy-pasting makes it easy to miss updating some cases... like it happenend here. src/hotspot/share/opto/vectornode.cpp line 2090: > 2088: int num_elem = vect_type()->length(); > 2089: BasicType elem_bt = vect_type()->element_basic_type(); > 2090: if (Matcher::match_rule_supported_vector(Op_SelectFromTwoVector, > num_elem, elem_bt)) { Suggestion: // Keep the node if it is supported, else lower it to other nodes. if (Matcher::match_rule_supported_vector(Op_SelectFromTwoVector, num_elem, elem_bt)) { src/hotspot/share/opto/vectornode.cpp line 2095: > 2093: Node* index_vec = in(1); > 2094: Node* src1 = in(2); > 2095: Node* src2 = in(3); Suggestion: Node* src1 = in(2); Node* src2 = in(3); unnecessary spaces src/hotspot/share/opto/vectornode.cpp line 2101: > 2099: // (VectorBlend > 2100: // (VectorRearrange SRC1, INDEX) > 2101: // (VectorRearrange SRC2, NORM_INDEX) Suggestion: // (VectorRearrange SRC1 INDEX) // (VectorRearrange SRC2 NORM_INDEX) Either consistently use commas or none at all ;) src/hotspot/share/opto/vectornode.cpp line 2104: > 2102: // MASK) > 2103: // This shall prevent an intrinsification failure and associated > argument > 2104: // boxing penalties. A quick comment about how the mask is computed could be nice. `MASK = INDEX < num_elem` src/hotspot/share/opto/vectornode.cpp line 2126: > 2124: case T_FLOAT: > 2125: return phase->transform(new VectorCastF2XNode(index_vec, > TypeVect::make(T_INT, num_elem))); > 2126: break; `break` after `return`? src/hotspot/share/opto/vectornode.cpp line 2141: > 2139: default: return elem_bt; > 2140: } > 2141: }; This is definitely a style question. But it might be nice to make these functions member functions. They now kinda disrupt the flow of the `::Ideal` method. And in some cases you use the captured variables, and in other cases you pass them in explicitly, even though they already exist in the captured scope... consistency would be nice. src/hotspot/share/opto/vectornode.cpp line 2148: > 2146: > 2147: BoolTest::mask pred = BoolTest::lt; > 2148: ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred)); Would `as_ConI()` be a better alternative to the `(ConINode*)` cast? src/hotspot/share/opto/vectornode.cpp line 2149: > 2147: BoolTest::mask pred = BoolTest::lt; > 2148: ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred)); > 2149: Node* lane_cnt = phase->makecon(lane_count_type()); Hmm. I don't like to have different names for the same thing. `num_elem` and `lane_count` and `lane_cnt`. What about a method `make_num_elem_node`, returns a `ConNode*`. Then you pass it around as `num_elem_scalar`, and broadcast it to `num_elem_vector`. src/hotspot/share/opto/vectornode.cpp line 2159: > 2157: > 2158: vmask_type = TypeVect::makemask(elem_bt, num_elem); > 2159: mask = phase->transform(new VectorMaskCastNode(mask, vmask_type)); I would just have two variables, and not overwrite it: `integral_vmask_type` and `vmask_type`. Maybe also `mask` could be split into two variables? src/hotspot/share/opto/vectornode.cpp line 2181: > 2179: default: return elem_bt; > 2180: } > 2181: }; You are now using this twice. Is there not some method that already does this? src/hotspot/share/opto/vectornode.cpp line 2183: > 2181: }; > 2182: // Targets emulating unsupported permutation for certain vector > types > 2183: // may need to message the indexes to match the users intent. Suggestion: // may need to massage the indexes to match the users intent. src/hotspot/share/opto/vectornode.hpp line 1272: > 1270: }; > 1271: > 1272: spurious newline src/hotspot/share/opto/vectornode.hpp line 1621: > 1619: public: > 1620: SelectFromTwoVectorNode(Node* in1, Node* in2, Node* in3, const > TypeVect* vt) > 1621: : VectorNode(in1, in2, in3, vt) {} I would prefer more expressive variable names and a short specification what the node does. Otherwise one always has to reverse-engineer what inputs are acceptable etc. I mean you could even require `VectorNode*` as inputs. ------------- PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2272308274 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738648483 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738738172 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738759799 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738767466 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738768205 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738765017 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738823939 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738808199 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738781635 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738787762 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738806978 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738814420 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738838073 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738835911 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738729168 PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738866021