On Mon, 17 Nov 2025 06:44:39 GMT, Jatin Bhateja <[email protected]> wrote:
>> Volodymyr Paprotski has updated the pull request incrementally with two >> additional commits since the last revision: >> >> - whitespace >> - address first comments > > src/hotspot/cpu/x86/assembler_x86.cpp line 3867: > >> 3865: (vector_len == AVX_256bit ? VM_Version::supports_avx2() : >> 3866: (vector_len == AVX_512bit ? VM_Version::supports_evex() : >> false)), ""); >> 3867: InstructionAttr attributes(vector_len, /* vex_w */ false, /* >> legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); > > When you check for AVX512-VL you allow accessing 128/256 bit registers from > the higher register bank [X/Y]MM(16-31) > > But your assertions are nowhere checking this. I believe those asserts are in `vex_prefix_and_encode` (https://github.com/openjdk/jdk/blob/6d3f7794ee6658d48eb2120c7bfe66ac412c6d14/src/hotspot/cpu/x86/assembler_x86.cpp#L13164) and `vex_prefix` (https://github.com/openjdk/jdk/blob/6d3f7794ee6658d48eb2120c7bfe66ac412c6d14/src/hotspot/cpu/x86/assembler_x86.cpp#L13047) I also haven't found any other instruction that does this check so I could emulate the style. > src/hotspot/cpu/x86/assembler_x86.cpp line 3882: > >> 3880: >> 3881: void Assembler::evmovsldup(XMMRegister dst, KRegister mask, >> XMMRegister src, bool merge, int vector_len) { >> 3882: assert(VM_Version::supports_evex(), ""); > > Suggestion: > > assert(vector_len == AVX_512 || VM_Version::supports_avx512vl), ""); Took the patch, but also kept the supports_evex() assert > test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 114: > >> 112: rnd.setSeed(seed); >> 113: //Note: it might be useful to increase this number during >> development of new intrinsics >> 114: final int repeat = 10000000; > > Instead of high repetition count can you try tuning the tiered compilation > threshold. The purpose of the test is to test various (pseudo-random) values and compare the results to the java implementation of same code. A single run-though of the test doesn't always prove that there are no bugs. A bit philosophical.. as is well known, when writing crypto, branches (conditional on secret) are disallowed; but e.g. carry propagation has the same 'conditional execution' effect. (Instead of "have you tested every branch direction" its "have you tested every carry") Besides a very careful range/overflow analysis (which I also did.. ntt functions skate very close to the int limit), exhaustive fuzz testing is the best method to find conditions that manual (range/overflow) analysis hasn't found; fuzz testing has very little math built in, so its also good at finding 'blind spots' I (and whomever has to review) might have not thought of.. > test/jdk/sun/security/provider/acvp/ML_DSA_Intrinsic_Test.java line 145: > >> 143: coeffs1[j] = rnd.nextInt(); >> 144: coeffs2[j] = rnd.nextInt(); >> 145: } > > You can uses generators for randome initialization of array I think you meant this? coeffs1 = rnd.ints(ML_DSA_N).toArray(); coeffs2 = rnd.ints(ML_DSA_N).toArray(); Didn't know about this, thanks. It does work.. But the original purpose (perhaps misguided, but its done) was to 'factor out' the allocations; the outer loop runs many million times (I've left it running for 6+hours during development) and so I wanted a 'somewhat efficient' test. In hindsight, these (1k) arrays could probably be stack allocated, but I did not want to depend on an optimization when I could just write it without allocations in the mainline ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535460279 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535804056 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535373444 PR Review Comment: https://git.openjdk.org/jdk/pull/28136#discussion_r2535199249
