On Thu, 16 Nov 2023 08:44:26 GMT, Eric Liu <e...@openjdk.org> wrote: >> Vector API defines zero-extend operations [1], which are going to be >> intrinsified and generated to `VectorUCastNode` by C2. This patch adds >> backend implementation for `VectorUCastNode` on AArch64. >> >> The micro benchmark shows significant performance improvement. In my test >> machine (SVE, 256-bit), the result is shown as below: >> >> >> >> Benchmark Before After Units Gain >> VectorZeroExtend.byte2Int 3168.251 243012.399 ops/ms 75.70 >> VectorZeroExtend.byte2Long 3212.201 216291.588 ops/ms 66.33 >> VectorZeroExtend.byte2Short 3391.968 182655.365 ops/ms 52.85 >> VectorZeroExtend.int2Long 1012.197 80448.553 ops/ms 78.48 >> VectorZeroExtend.short2Int 1812.471 153416.828 ops/ms 83.65 >> VectorZeroExtend.short2Long 1788.382 129794.814 ops/ms 71.58 >> >> >> On other Neon systems, we can get similar performance boost as a result of >> intrinsification success. >> >> Since `VectorUCastNode` only used in Vector API's zero extension currently, >> this patch also adds assertion on nodes' definitions to clarify their usages. >> >> [TEST] >> compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines. >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726 > > Eric Liu has updated the pull request incrementally with one additional > commit since the last revision: > > update m4 > > Change-Id: I82bf5f9384f79e09965a0498ad2de45cec6f0a29
src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1352: > 1350: // 4B/8B to 4S/8S > 1351: assert(dst_vlen_in_bytes == 8 || dst_vlen_in_bytes == 16, > "unsupported"); > 1352: (this->*ext)(dst, T8H, src, T8B); One thing might make this cleaner: I suggest you make `_xshll` protected rather than private, then here `_xshll(is_unsigned, dst, T8H, src, T8B, 0);` src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 1415: > 1413: break; > 1414: case S: > 1415: (this->*unpklo)(dst, H, src); AS above: try making` is_unsigned` a parameter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16670#discussion_r1395359612 PR Review Comment: https://git.openjdk.org/jdk/pull/16670#discussion_r1395362740