Vector API defines zero-extend operations [1], which are going to be intrinsified and generated to `VectorUCastNode` by C2. This patch adds backend implementation for `VectorUCastNode` on AArch64.
The micro benchmark shows significant performance improvement. In my test machine (SVE, 256-bit), the result is shown as below: Benchmark Before After Units Gain VectorZeroExtend.byte2Int 3168.251 243012.399 ops/ms 75.70 VectorZeroExtend.byte2Long 3212.201 216291.588 ops/ms 66.33 VectorZeroExtend.byte2Short 3391.968 182655.365 ops/ms 52.85 VectorZeroExtend.int2Long 1012.197 80448.553 ops/ms 78.48 VectorZeroExtend.short2Int 1812.471 153416.828 ops/ms 83.65 VectorZeroExtend.short2Long 1788.382 129794.814 ops/ms 71.58 On other Neon systems, we can get similar performance boost as a result of intrinsification success. Since `VectorUCastNode` only used in Vector API's zero extension currently, this patch also adds assertion on nodes' definitions to clarify their usages. [TEST] compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines. [1] https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726 ------------- Commit messages: - 8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts Changes: https://git.openjdk.org/jdk/pull/16670/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16670&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8319872 Stats: 376 lines in 7 files changed: 337 ins; 0 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/16670.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16670/head:pull/16670 PR: https://git.openjdk.org/jdk/pull/16670