On Sun, 20 Apr 2025 03:28:48 GMT, SendaoYan <s...@openjdk.org> wrote:

>> ### Summary:
>> [JDK-8318650](http://java-service.client.nvidia.com/?q=8318650) added the 
>> hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. 
>> This patch aims at implementing the equivalent functionality for AArch64 SVE 
>> platform. In addition to the AArch64 backend support, this patch also 
>> refactors the API implementation in Java side and the compiler mid-end part 
>> to make the operations more efficient and maintainable across different 
>> architectures.
>> 
>> ### Background:
>> Vector gather load APIs load values from memory addresses calculated by 
>> adding a base pointer to integer indices stored in an int array. SVE 
>> provides native vector gather load instructions for byte/short types using 
>> an int vector saving indices (see [2][3]).
>> 
>> The number of loaded elements must match the index vector's element count. 
>> Since int elements are 4/2 times larger than byte/short elements, and given 
>> `MaxVectorSize` constraints, the operation may need to be splitted into 
>> multiple parts.
>> 
>> Using a 128-bit byte vector gather load as an example, there are four 
>> scenarios with different `MaxVectorSize`:
>> 
>> 1. `MaxVectorSize = 16, byte_vector_size = 16`:
>>    - Can load 4 indices per vector register
>>    - So can finish 4 bytes per gather-load operation
>>    - Requires 4 times of gather-loads and final merge
>>    Example:
>>    ```
>>    byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...]
>>    int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9]
>> 
>>    4 gather-load:
>>    idx_v1 = [1 4 2 3]    gather_v1 = [0000 0000 0000 becd]
>>    idx_v2 = [2 5 7 5]    gather_v2 = [0000 0000 0000 cfhf]
>>    idx_v3 = [1 7 6 0]    gather_v3 = [0000 0000 0000 bhga]
>>    idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp]
>>    merge: v = [jlkp bhga cfhf becd]
>>    ```
>> 
>> 2. `MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2`:
>>    - Can load 8 indices per vector register
>>    - So can finish 8 bytes per gather-load operation
>>    - Requires 2 times of gather-loads and merge
>>    Example:
>>    ```
>>    byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...]
>>    int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9]
>> 
>>    2 gather-load:
>>    idx_v1 = [2 5 7 5 1 4 2 3]
>>    idx_v2 = [9 11 10 15 1 7 6 0]
>>    gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd]
>>    gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga]
>>    merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd]
>>    ```
>> 
>> 3. `MaxVectorSize = 64, byte_v...
>
> test/hotspot/jtreg/compiler/vectorapi/VectorGatherSubwordTest.java line 39:
> 
>> 37:  * @modules jdk.incubator.vector
>> 38:  *
>> 39:  * @run driver compiler.vectorapi.VectorGatherSubwordTest
> 
> Should we use `@run main` instead of `@run driver`

Thanks for taking a look at this PR! I think it's fine using `@run main` 
instead.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24679#discussion_r2053187161

Reply via email to