On Tue, 18 Oct 2022 01:44:21 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:
>> "`VectorSupport.indexVector()`" is used to compute a vector that contains >> the index values based on a given vector and a scale value (`i.e. index = >> vec + iota * scale`). This function is widely used in other APIs like >> "`VectorMask.indexInRange`" which is useful to the tail loop vectorization. >> And it can be easily implemented with the vector instructions. >> >> This patch adds the vector intrinsic implementation of it. The steps are: >> >> 1) Load the const "iota" vector. >> >> We extend the "`vector_iota_indices`" stubs from byte to other integral >> types. For floating point vectors, it needs an additional vector cast to get >> the right iota values. >> >> 2) Compute indexes with "`vec + iota * scale`" >> >> Here is the performance result to the new added micro benchmark on ARM NEON: >> >> Benchmark Gain >> IndexVectorBenchmark.byteIndexVector 1.477 >> IndexVectorBenchmark.doubleIndexVector 5.031 >> IndexVectorBenchmark.floatIndexVector 5.342 >> IndexVectorBenchmark.intIndexVector 5.529 >> IndexVectorBenchmark.longIndexVector 3.177 >> IndexVectorBenchmark.shortIndexVector 5.841 >> >> >> Please help to review and share the feedback! Thanks in advance! > > Xiaohong Gong has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains three additional > commits since the last revision: > > - Add the floating point support for VectorLoadConst and remove the > VectorCast > - Merge branch 'master' into JDK-8293409 > - 8293409: [vectorapi] Intrinsify VectorSupport.indexVector Hi @XiaohongGong , patch now shows significant gains on both AVX512 and legacy X86 targets. X86 and common IR changes LGTM, thanks! ------------- Marked as reviewed by jbhateja (Reviewer). PR: https://git.openjdk.org/jdk/pull/10332