On Tue, 21 Mar 2023 16:16:31 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:
>> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of >> the bit type. Currently, VectorShuffle is stored as a byte array, and would >> be expanded upon usage. This poses several drawbacks: >> >> 1. Inefficient conversions between a shuffle and its corresponding vector. >> This hinders the performance when the shuffle indices are not constant and >> are loaded or computed dynamically. >> 2. Redundant expansions in `rearrange` operations. On all platforms, it >> seems that a shuffle index vector is always expanded to the correct type >> before executing the `rearrange` operations. >> 3. Some redundant intrinsics are needed to support this handling as well as >> special considerations in the C2 compiler. >> 4. Range checks are performed using `VectorShuffle::toVector`, which is >> inefficient for FP types since both FP conversions and FP comparisons are >> more expensive than the integral ones. >> >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask >> ; >> {external_word} >> vpackusdw %xmm0,%xmm0,%xmm0 >> vpackuswb %xmm0,%xmm0,%xmm0 >> vpmovsxbd %xmm0,%xmm3 >> vpcmpgtd %xmm3,%xmm1,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fc2acb4e0d8 >> vpmovzxbd %xmm0,%xmm0 >> vpermd %ymm2,%ymm0,%ymm0 >> movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} >> vmovdqu %xmm0,0x10(%r10) >> >> After: >> movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} >> vmovdqu 0x10(%r10),%xmm2 >> vpxor %xmm0,%xmm0,%xmm0 >> vpcmpgtd %xmm2,%xmm0,%xmm3 >> vtestps %xmm3,%xmm3 >> jne 0x00007fa818b27cb1 >> vpermd %ymm1,%ymm2,%ymm0 >> movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} >> vmovdqu %xmm0,0x10(%r10) >> >> Please take a look and leave reviews. Thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional > commits since the last revision: > > - missing casts > - clean up src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java line 118: > 116: return (VectorShuffle<E>) > v.rearrange(shuffle.cast(vspecies().asIntegral())) > 117: .toShuffle() > 118: .cast(vspecies()); Style issue. Suggest to change to: return (VectorShuffle<E>) v.rearrange(shuffle.cast(vspecies().asIntegral())) .toShuffle() .cast(vspecies()); I also noticed that the similar shuffle cast code is used more frequently. Could we wrap such code `toShuffle().cast(vspecies())` to a separate method? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractShuffle.java line 130: > 128: } else { > 129: v = v.blend(v.lanewise(VectorOperators.ADD, length()), > 130: v.compare(VectorOperators.LT, 0)); Style issue. Suggest to change to: v = v.blend(v.lanewise(VectorOperators.ADD, length()), v.compare(VectorOperators.LT, 0)); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 198: > 196: if ((length() & (length() - 1)) != 0) { > 197: return wrap ? shuffleFromOp(i -> > (VectorIntrinsics.wrapToRange(i * step + start, length()))) > 198: : shuffleFromOp(i -> i * step + start); Code style issue. Suggest to: return wrap ? shuffleFromOp(i -> (VectorIntrinsics.wrapToRange(i * step + start, length()))) : shuffleFromOp(i -> i * step + start); src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 204: > 202: Vector iota = species.iota(); > 203: iota = iota.lanewise(VectorOperators.MUL, step) > 204: .lanewise(VectorOperators.ADD, start); Style issue. Same as above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144384585 PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144389023 PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144390218 PR Review Comment: https://git.openjdk.org/jdk/pull/13093#discussion_r1144390692