On Mon, 19 Sep 2022 08:51:24 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:

> "`VectorSupport.indexVector()`" is used to compute a vector that contains the 
> index values based on a given vector and a scale value (`i.e. index = vec + 
> iota * scale`). This function is widely used in other APIs like 
> "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. 
> And it can be easily implemented with the vector instructions.
> 
> This patch adds the vector intrinsic implementation of it. The steps are:
> 
>   1) Load the const "iota" vector.
> 
>   We extend the "`vector_iota_indices`" stubs from byte to other integral 
> types. For floating point vectors, it needs an additional vector cast to get 
> the right iota values.
> 
>   2) Compute indexes with "`vec + iota * scale`"
> 
> Here is the performance result to the new added micro benchmark on ARM NEON:
> 
> Benchmark                              Gain
> IndexVectorBenchmark.byteIndexVector   1.477
> IndexVectorBenchmark.doubleIndexVector 5.031
> IndexVectorBenchmark.floatIndexVector  5.342
> IndexVectorBenchmark.intIndexVector    5.529
> IndexVectorBenchmark.longIndexVector   3.177
> IndexVectorBenchmark.shortIndexVector  5.841
> 
> 
> Please help to review and share the feedback! Thanks in advance!

src/hotspot/share/opto/vectorIntrinsics.cpp line 2949:

> 2947:   } else if (elem_bt == T_DOUBLE) {
> 2948:     iota = gvn().transform(new VectorCastL2XNode(iota, vt));
> 2949:   }

Since we are loading constants from stub initialized memory locations, defining 
new stubs for floating point iota indices may eliminate need for costly 
conversion instructions. Specially on X86 conversion b/w Long and Double is 
only supported by AVX512DQ targets and intrinsification may fail for legacy 
targets.

src/hotspot/share/opto/vectorIntrinsics.cpp line 2978:

> 2976:       case T_DOUBLE: {
> 2977:         scale = gvn().transform(new ConvI2LNode(scale));
> 2978:         scale = gvn().transform(new ConvL2DNode(scale));

Prior target support check for these IR nodes may prevent surprises in the 
backend.

src/hotspot/share/opto/vectorIntrinsics.cpp line 2978:

> 2976:       case T_DOUBLE: {
> 2977:         scale = gvn().transform(new ConvI2LNode(scale));
> 2978:         scale = gvn().transform(new ConvL2DNode(scale));

Any specific reason for not directly using ConvI2D for double case.

-------------

PR: https://git.openjdk.org/jdk/pull/10332

Reply via email to