On Thu, 29 Aug 2024 05:42:58 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> Hi All,
>> 
>> As per the discussion on panama-dev mailing list[1], patch adds the support 
>> for following new two vector permutation APIs.
>> 
>> 
>> Declaration:-
>>     Vector<E>.selectFrom(Vector<E> v1, Vector<E> v2)
>> 
>> 
>> Semantics:-
>>     Using index values stored in the lanes of "this" vector, assemble the 
>> values stored in first (v1) and second (v2) vector arguments. Thus, first 
>> and second vector serves as a table, whose elements are selected based on 
>> index value vector. API is applicable to all integral and floating-point 
>> types.  The result of this operation is semantically equivalent to 
>> expression v1.rearrange(this.toShuffle(), v2). Values held in index vector 
>> lanes must lie within valid two vector index range [0, 2*VLEN) else an 
>> IndexOutOfBoundException is thrown.  
>> 
>> Summary of changes:
>> -  Java side implementation of new selectFrom API.
>> -  C2 compiler IR and inline expander changes.
>> -  In absence of direct two vector permutation instruction in target ISA, a 
>> lowering transformation dismantles new IR into constituent IR supported by 
>> target platforms. 
>> -  Optimized x86 backend implementation for AVX512 and legacy target.
>> -  Function tests covering new API.
>> 
>> JMH micro included with this patch shows around 10-15x gain over existing 
>> rearrange API :-
>> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server]
>> 
>> 
>>   Benchmark                                     (size)   Mode  Cnt      
>> Score   Error   Units
>> SelectFromBenchmark.rearrangeFromByteVector     1024  thrpt    2   2041.762  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromByteVector     2048  thrpt    2   1028.550  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromIntVector      1024  thrpt    2    962.605  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromIntVector      2048  thrpt    2    479.004  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromLongVector     1024  thrpt    2    359.758  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromLongVector     2048  thrpt    2    178.192  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromShortVector    1024  thrpt    2   1463.459  
>>         ops/ms
>> SelectFromBenchmark.rearrangeFromShortVector    2048  thrpt    2    727.556  
>>         ops/ms
>> SelectFromBenchmark.selectFromByteVector        1024  thrpt    2  33254.830  
>>         ops/ms
>> SelectFromBenchmark.selectFromByteVector        2048  thrpt    2  17313.174  
>>         ops/ms
>> SelectFromBenchmark.selectFromIntVector         1024  thrpt    2  10756.804  
>>         ops/ms
>> S...
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Adding descriptive comments

I left a few comments, hopefully I can spend some more time on this next week

src/hotspot/cpu/x86/matcher_x86.hpp line 215:

> 213:   }
> 214: 
> 215:   static bool vector_indexes_needs_massaging(BasicType ety, int vlen) {

The name "massaging" sounds quite vague. Can we have something more expressive 
/ descriptive? Is it the vector that "needs" massaging or the indices that 
"need" massaging?

Why `ety` and not `bt`? Is that not the name we use most often?

src/hotspot/cpu/x86/x86.ad line 10490:

> 10488: 
> 10489: 
> 10490: instruct selectFromTwoVec_evex(vec dst, vec src1, vec src2)

You could rename `dst` -> `mask_and_dst`. That would maybe help the reader to 
more quickly know that it is an input-mask and output-dst.

src/hotspot/share/opto/vectorIntrinsics.cpp line 2716:

> 2714:   C->set_max_vector_size(MAX2(C->max_vector_size(), (uint)(num_elem * 
> type2aelembytes(elem_bt))));
> 2715:   return true;
> 2716: }

The code in these methods are extremely duplicated. Unboxing and boxing in 
every method around here. Maybe not your problem in this PR.

BTW: your error logging used `v1` in all 3 cases `op1-3`, you probably want to 
give them useful names. `v1-3` probably?

All this copy-pasting makes it easy to miss updating some cases... like it 
happenend here.

src/hotspot/share/opto/vectornode.cpp line 2090:

> 2088:   int num_elem = vect_type()->length();
> 2089:   BasicType elem_bt = vect_type()->element_basic_type();
> 2090:   if (Matcher::match_rule_supported_vector(Op_SelectFromTwoVector, 
> num_elem, elem_bt)) {

Suggestion:

  // Keep the node if it is supported, else lower it to other nodes.
  if (Matcher::match_rule_supported_vector(Op_SelectFromTwoVector, num_elem, 
elem_bt)) {

src/hotspot/share/opto/vectornode.cpp line 2095:

> 2093:   Node* index_vec = in(1);
> 2094:   Node* src1  = in(2);
> 2095:   Node* src2  = in(3);

Suggestion:

  Node* src1 = in(2);
  Node* src2 = in(3);

unnecessary spaces

src/hotspot/share/opto/vectornode.cpp line 2101:

> 2099:   //     (VectorBlend
> 2100:   //         (VectorRearrange SRC1, INDEX)
> 2101:   //         (VectorRearrange SRC2, NORM_INDEX)

Suggestion:

  //         (VectorRearrange SRC1 INDEX)
  //         (VectorRearrange SRC2 NORM_INDEX)

Either consistently use commas or none at all ;)

src/hotspot/share/opto/vectornode.cpp line 2104:

> 2102:   //         MASK)
> 2103:   // This shall prevent an intrinsification failure and associated 
> argument
> 2104:   // boxing penalties.

A quick comment about how the mask is computed could be nice.
`MASK = INDEX < num_elem`

src/hotspot/share/opto/vectornode.cpp line 2126:

> 2124:       case T_FLOAT:
> 2125:         return phase->transform(new VectorCastF2XNode(index_vec, 
> TypeVect::make(T_INT, num_elem)));
> 2126:       break;

`break` after `return`?

src/hotspot/share/opto/vectornode.cpp line 2141:

> 2139:       default: return elem_bt;
> 2140:     }
> 2141:   };

This is definitely a style question. But it might be nice to make these 
functions member functions. They now kinda disrupt the flow of the `::Ideal` 
method. And in some cases you use the captured variables, and in other cases 
you pass them in explicitly, even though they already exist in the captured 
scope... consistency would be nice.

src/hotspot/share/opto/vectornode.cpp line 2148:

> 2146: 
> 2147:   BoolTest::mask pred = BoolTest::lt;
> 2148:   ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred));

Would `as_ConI()` be a better alternative to the `(ConINode*)` cast?

src/hotspot/share/opto/vectornode.cpp line 2149:

> 2147:   BoolTest::mask pred = BoolTest::lt;
> 2148:   ConINode* pred_node = (ConINode*)phase->makecon(TypeInt::make(pred));
> 2149:   Node* lane_cnt = phase->makecon(lane_count_type());

Hmm. I don't like to have different names for the same thing. `num_elem` and 
`lane_count` and `lane_cnt`.

What about a method `make_num_elem_node`, returns a `ConNode*`.
Then you pass it around as `num_elem_scalar`, and broadcast it to 
`num_elem_vector`.

src/hotspot/share/opto/vectornode.cpp line 2159:

> 2157: 
> 2158:   vmask_type = TypeVect::makemask(elem_bt, num_elem);
> 2159:   mask = phase->transform(new VectorMaskCastNode(mask, vmask_type));

I would just have two variables, and not overwrite it: `integral_vmask_type` 
and `vmask_type`. Maybe also `mask` could be split into two variables?

src/hotspot/share/opto/vectornode.cpp line 2181:

> 2179:         default: return elem_bt;
> 2180:       }
> 2181:     };

You are now using this twice. Is there not some method that already does this?

src/hotspot/share/opto/vectornode.cpp line 2183:

> 2181:     };
> 2182:     // Targets emulating unsupported permutation for certain vector 
> types
> 2183:     // may need to message the indexes to match the users intent.

Suggestion:

    // may need to massage the indexes to match the users intent.

src/hotspot/share/opto/vectornode.hpp line 1272:

> 1270: };
> 1271: 
> 1272: 

spurious newline

src/hotspot/share/opto/vectornode.hpp line 1621:

> 1619: public:
> 1620:   SelectFromTwoVectorNode(Node* in1, Node* in2, Node* in3, const 
> TypeVect* vt)
> 1621:   : VectorNode(in1, in2, in3, vt) {}

I would prefer more expressive variable names and a short specification what 
the node does. Otherwise one always has to reverse-engineer what inputs are 
acceptable etc. I mean you could even require `VectorNode*` as inputs.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2272308274
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738648483
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738738172
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738759799
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738767466
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738768205
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738765017
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738823939
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738808199
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738781635
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738787762
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738806978
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738814420
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738838073
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738835911
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738729168
PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1738866021

Reply via email to