Re: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2]

Ferenc Rakoczi Wed, 07 Jan 2026 09:51:13 -0800

On Wed, 7 Jan 2026 16:43:30 GMT, Volodymyr Paprotski <[email protected]> 
wrote:


>> I believe the numbers are right: with each pass 256 bytes of coefficients 
>> are `parsed` into the parse buffer.  This means that half of the 
>> coefficients have been processed (`parsedLength` = 128).  Would having a 
>> comment stating as such address your concerns?
>
> I wasn't as clear in my question. The asm is indeed processing the bytes in 
> the increment. What I was trying to convince myself about.. 'how come we are 
> not reading past the end of the array. Or are we?'.
> 
> On one hand, this is exactly what the existing asm code does, so I will 
> assume that its correct. However, on the java side/version of this code, I 
> could only convince myself about processing ~two AVX512 vectors at a time, 
> not four.
> 
> So either I cant count, or there is some further (implicit) restrictions on 
> the callers of `twelve2Sixteen`

In ML_KEM.java there is this  assert (and this is the only call to  
implKyber12To16() 

        assert ((remainder == 0) || (remainder == 48)) &&
                (index + i * 96 <= condensed.length);
        implKyber12To16(condensed, index, parsed, parsedLength);

and one can check how the callers of twelve2Sixteen() make sure that this is 
the case.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28815#discussion_r2669490940

Re: RFR: 8360934: Add AVX-512 intrinsics for ML-KEM - enhancement on AVX512_VBMI and AVX512_VBMI2 [v2]

Reply via email to