On Wed, 16 Apr 2025 08:58:34 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:

> ### Summary:
> [JDK-8318650](http://java-service.client.nvidia.com/?q=8318650) added the 
> hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. This 
> patch aims at implementing the equivalent functionality for AArch64 SVE 
> platform. In addition to the AArch64 backend support, this patch also 
> refactors the API implementation in Java side and the compiler mid-end part 
> to make the operations more efficient and maintainable across different 
> architectures.
> 
> ### Background:
> Vector gather load APIs load values from memory addresses calculated by 
> adding a base pointer to integer indices stored in an int array. SVE provides 
> native vector gather load instructions for byte/short types using an int 
> vector saving indices (see [2][3]).
> 
> The number of loaded elements must match the index vector's element count. 
> Since int elements are 4/2 times larger than byte/short elements, and given 
> `MaxVectorSize` constraints, the operation may need to be splitted into 
> multiple parts.
> 
> Using a 128-bit byte vector gather load as an example, there are four 
> scenarios with different `MaxVectorSize`:
> 
> 1. `MaxVectorSize = 16, byte_vector_size = 16`:
>    - Can load 4 indices per vector register
>    - So can finish 4 bytes per gather-load operation
>    - Requires 4 times of gather-loads and final merge
>    Example:
>    ```
>    byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...]
>    int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9]
> 
>    4 gather-load:
>    idx_v1 = [1 4 2 3]    gather_v1 = [0000 0000 0000 becd]
>    idx_v2 = [2 5 7 5]    gather_v2 = [0000 0000 0000 cfhf]
>    idx_v3 = [1 7 6 0]    gather_v3 = [0000 0000 0000 bhga]
>    idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp]
>    merge: v = [jlkp bhga cfhf becd]
>    ```
> 
> 2. `MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2`:
>    - Can load 8 indices per vector register
>    - So can finish 8 bytes per gather-load operation
>    - Requires 2 times of gather-loads and merge
>    Example:
>    ```
>    byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...]
>    int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9]
> 
>    2 gather-load:
>    idx_v1 = [2 5 7 5 1 4 2 3]
>    idx_v2 = [9 11 10 15 1 7 6 0]
>    gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd]
>    gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga]
>    merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd]
>    ```
> 
> 3. `MaxVectorSize = 64, byte_vector_size = MaxVectorSize / 4`:
>    - Can load 16 indices per vector register
>    - So can ...

Changes requested by syan (Committer).

test/hotspot/jtreg/compiler/vectorapi/VectorGatherSubwordTest.java line 39:

> 37:  * @modules jdk.incubator.vector
> 38:  *
> 39:  * @run driver compiler.vectorapi.VectorGatherSubwordTest

Should we use `@run main` instead of `@run driver`

-------------

PR Review: https://git.openjdk.org/jdk/pull/24679#pullrequestreview-2780137136
PR Review Comment: https://git.openjdk.org/jdk/pull/24679#discussion_r2051625676

Reply via email to