On Mon, 19 Sep 2022 08:51:24 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:
> "`VectorSupport.indexVector()`" is used to compute a vector that contains the > index values based on a given vector and a scale value (`i.e. index = vec + > iota * scale`). This function is widely used in other APIs like > "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. > And it can be easily implemented with the vector instructions. > > This patch adds the vector intrinsic implementation of it. The steps are: > > 1) Load the const "iota" vector. > > We extend the "`vector_iota_indices`" stubs from byte to other integral > types. For floating point vectors, it needs an additional vector cast to get > the right iota values. > > 2) Compute indexes with "`vec + iota * scale`" > > Here is the performance result to the new added micro benchmark on ARM NEON: > > Benchmark Gain > IndexVectorBenchmark.byteIndexVector 1.477 > IndexVectorBenchmark.doubleIndexVector 5.031 > IndexVectorBenchmark.floatIndexVector 5.342 > IndexVectorBenchmark.intIndexVector 5.529 > IndexVectorBenchmark.longIndexVector 3.177 > IndexVectorBenchmark.shortIndexVector 5.841 > > > Please help to review and share the feedback! Thanks in advance! src/hotspot/share/opto/vectorIntrinsics.cpp line 2949: > 2947: } else if (elem_bt == T_DOUBLE) { > 2948: iota = gvn().transform(new VectorCastL2XNode(iota, vt)); > 2949: } Since we are loading constants from stub initialized memory locations, defining new stubs for floating point iota indices may eliminate need for costly conversion instructions. Specially on X86 conversion b/w Long and Double is only supported by AVX512DQ targets and intrinsification may fail for legacy targets. src/hotspot/share/opto/vectorIntrinsics.cpp line 2978: > 2976: case T_DOUBLE: { > 2977: scale = gvn().transform(new ConvI2LNode(scale)); > 2978: scale = gvn().transform(new ConvL2DNode(scale)); Prior target support check for these IR nodes may prevent surprises in the backend. src/hotspot/share/opto/vectorIntrinsics.cpp line 2978: > 2976: case T_DOUBLE: { > 2977: scale = gvn().transform(new ConvI2LNode(scale)); > 2978: scale = gvn().transform(new ConvL2DNode(scale)); Any specific reason for not directly using ConvI2D for double case. ------------- PR: https://git.openjdk.org/jdk/pull/10332