On Sun, 20 Apr 2025 03:28:48 GMT, SendaoYan <s...@openjdk.org> wrote:
>> ### Summary: >> [JDK-8318650](http://java-service.client.nvidia.com/?q=8318650) added the >> hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. >> This patch aims at implementing the equivalent functionality for AArch64 SVE >> platform. In addition to the AArch64 backend support, this patch also >> refactors the API implementation in Java side and the compiler mid-end part >> to make the operations more efficient and maintainable across different >> architectures. >> >> ### Background: >> Vector gather load APIs load values from memory addresses calculated by >> adding a base pointer to integer indices stored in an int array. SVE >> provides native vector gather load instructions for byte/short types using >> an int vector saving indices (see [2][3]). >> >> The number of loaded elements must match the index vector's element count. >> Since int elements are 4/2 times larger than byte/short elements, and given >> `MaxVectorSize` constraints, the operation may need to be splitted into >> multiple parts. >> >> Using a 128-bit byte vector gather load as an example, there are four >> scenarios with different `MaxVectorSize`: >> >> 1. `MaxVectorSize = 16, byte_vector_size = 16`: >> - Can load 4 indices per vector register >> - So can finish 4 bytes per gather-load operation >> - Requires 4 times of gather-loads and final merge >> Example: >> ``` >> byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...] >> int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] >> >> 4 gather-load: >> idx_v1 = [1 4 2 3] gather_v1 = [0000 0000 0000 becd] >> idx_v2 = [2 5 7 5] gather_v2 = [0000 0000 0000 cfhf] >> idx_v3 = [1 7 6 0] gather_v3 = [0000 0000 0000 bhga] >> idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp] >> merge: v = [jlkp bhga cfhf becd] >> ``` >> >> 2. `MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2`: >> - Can load 8 indices per vector register >> - So can finish 8 bytes per gather-load operation >> - Requires 2 times of gather-loads and merge >> Example: >> ``` >> byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...] >> int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9] >> >> 2 gather-load: >> idx_v1 = [2 5 7 5 1 4 2 3] >> idx_v2 = [9 11 10 15 1 7 6 0] >> gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd] >> gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga] >> merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd] >> ``` >> >> 3. `MaxVectorSize = 64, byte_v... > > test/hotspot/jtreg/compiler/vectorapi/VectorGatherSubwordTest.java line 39: > >> 37: * @modules jdk.incubator.vector >> 38: * >> 39: * @run driver compiler.vectorapi.VectorGatherSubwordTest > > Should we use `@run main` instead of `@run driver` Thanks for taking a look at this PR! I think it's fine using `@run main` instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24679#discussion_r2053187161