On Wed, 17 Jun 2026 09:27:19 GMT, Ferenc Rakoczi <[email protected]> wrote:

>> Addendum:
>> 
>> There is no need to include the static assert for even length in the 
>> overloaded variant of `vs_st2_post` i.e. you can just use this:
>> 
>> 
>>   // store 2 x N-vector sequences interleaved into 2 * N quadword
>>   // memory locations via the address supplied in base using
>>   // post-increment addressing.
>>   template<int N>
>>   void vs_st2_post(const VSeq<N>& v1, const VSeq<N>& v2,
>>                    Assembler::SIMD_Arrangement T, Register base) {
>>     for (int i = 0; i < N; i++) {
>>       __ st2(v1[i], v2[i], T, __ post(base, 32));
>>     }
>>   }```
>
> This idea sounds great, but, unfortunately, in st2(v1, v2, T, post(base, 32)) 
> it is required that the register index of v2 be one more than that of v1, so 
> I just added a comment at the consuming part.

Ah, yes. That's a shame as saving the data in canonical order would be much 
better. We could salvage this by redeclaring the sequences so that adjacent 
elements A[i] and D[i] are adjacent vector registers.

VSeq<4> A(16, 2);
VSeq<4> D(17, 2);
VSeq<4> B(24);
VSeq<4> C(28);

or equivalently

VSeq<8> A_D(16);
VSeq<4> A = vs_even(A_D);
VSeq<4> D = vs_odd(A_D);
VSeq<4> B(24);
VSeq<4> C(28);

but then a reader needs to work out why this is being done.

I'll settle for your comment that restates the permuted layout just prior to 
the ldrs.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/30941#discussion_r3427880307

Reply via email to