Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions

2022-10-18 Thread Sandhya Viswanathan
On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk wrote: > Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v3]

2022-10-24 Thread Sandhya Viswanathan
On Fri, 21 Oct 2022 20:20:58 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to co

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v3]

2022-10-24 Thread Sandhya Viswanathan
On Fri, 21 Oct 2022 20:20:58 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to co

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-10-24 Thread Sandhya Viswanathan
On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to co

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-10-25 Thread Sandhya Viswanathan
On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz test to co

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-11-04 Thread Sandhya Viswanathan
On Fri, 4 Nov 2022 20:59:10 GMT, Volodymyr Paprotski wrote: >> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: >> >>> 173: // Choice of 1024 is arbitrary, need enough data blocks to >>> amortize conversion overhead >>> 174: // and not affect p

Re: RFR: 8247645: ChaCha20 intrinsics

2022-11-06 Thread Sandhya Viswanathan
On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh wrote: > This PR delivers ChaCha20 intrinsics that accelerate the core block function > that generates key stream from the key, counter and nonce. Intrinsics have > been written for the following platforms and instruction sets: > > - x86_64: AVX, A

Re: RFR: 8247645: ChaCha20 intrinsics [v3]

2022-11-10 Thread Sandhya Viswanathan
On Thu, 10 Nov 2022 20:12:30 GMT, Jamil Nimeh wrote: >> Jamil Nimeh has updated the pull request incrementally with one additional >> commit since the last revision: >> >> replace hi/lo word shuffles and left-right shift/or operations for vpshufd >> on byte-aligned rotations > > using vpshuf

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11]

2022-11-10 Thread Sandhya Viswanathan
On Thu, 10 Nov 2022 01:22:04 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-10 Thread Sandhya Viswanathan
On Fri, 11 Nov 2022 01:14:05 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-10 Thread Sandhya Viswanathan
On Fri, 11 Nov 2022 01:14:05 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8247645: ChaCha20 intrinsics [v3]

2022-11-14 Thread Sandhya Viswanathan
On Thu, 10 Nov 2022 20:11:46 GMT, Jamil Nimeh wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function >> that generates key stream from the key, counter and nonce. Intrinsics have >> been written for the following platforms and instruction sets: >> >> - x86_64:

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Sandhya Viswanathan
On Tue, 15 Nov 2022 00:10:35 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Sandhya Viswanathan
On Wed, 16 Nov 2022 20:52:14 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8247645: ChaCha20 intrinsics [v3]

2022-11-17 Thread Sandhya Viswanathan
On Thu, 10 Nov 2022 20:11:46 GMT, Jamil Nimeh wrote: >> This PR delivers ChaCha20 intrinsics that accelerate the core block function >> that generates key stream from the key, counter and nonce. Intrinsics have >> been written for the following platforms and instruction sets: >> >> - x86_64:

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations

2022-11-28 Thread Sandhya Viswanathan
On Wed, 23 Nov 2022 23:33:32 GMT, Volodymyr Paprotski wrote: > Regarding mainline: > - I decided not to 'unroll' the top while loop (i.e. `engineUpdate(byte[] > input, int offset, int len)` is unrolled) >- It is debatable which version is easier to understand. If this version > is 'too comp

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations [v3]

2022-12-01 Thread Sandhya Viswanathan
On Thu, 1 Dec 2022 18:23:45 GMT, Volodymyr Paprotski wrote: >> There is now an intrinsic for Poly1305, which is only enabled on the >> `engineUpdate([]byte)` path. This PR adds intrinsic support >> `engineUpdate(ByteBuffer)` (when the bytebuffer `hasArray`). >> >> Fuzzing test expanded to also

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations [v3]

2022-12-01 Thread Sandhya Viswanathan
On Thu, 1 Dec 2022 18:23:45 GMT, Volodymyr Paprotski wrote: >> There is now an intrinsic for Poly1305, which is only enabled on the >> `engineUpdate([]byte)` path. This PR adds intrinsic support >> `engineUpdate(ByteBuffer)` (when the bytebuffer `hasArray`). >> >> Fuzzing test expanded to also

Re: RFR: 8297379: Enable the ByteBuffer path of Poly1305 optimizations [v3]

2022-12-05 Thread Sandhya Viswanathan
On Thu, 1 Dec 2022 18:23:45 GMT, Volodymyr Paprotski wrote: >> There is now an intrinsic for Poly1305, which is only enabled on the >> `engineUpdate([]byte)` path. This PR adds intrinsic support >> `engineUpdate(ByteBuffer)` (when the bytebuffer `hasArray`). >> >> Fuzzing test expanded to also

Re: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state

2023-08-31 Thread Sandhya Viswanathan
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote: > In addition to the issue > [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing > the scope from benchmark to thread for below benchmark files having shared > state, also which fixes few of the benchmarks scalabili

Re: RFR: 8314085: Fixing scope from benchmark to thread for JMH tests having shared state

2023-09-05 Thread Sandhya Viswanathan
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote: > In addition to the issue > [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing > the scope from benchmark to thread for below benchmark files having shared > state, also which fixes few of the benchmarks scalabili

Re: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2]

2023-09-13 Thread Sandhya Viswanathan
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using >> AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:Use

Re: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2]

2023-09-13 Thread Sandhya Viswanathan
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using >> AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:Use

Re: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2]

2023-09-22 Thread Sandhya Viswanathan
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using >> AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:Use

Re: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2]

2023-09-22 Thread Sandhya Viswanathan
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using >> AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:Use

Re: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2]

2023-09-22 Thread Sandhya Viswanathan
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using >> AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:Use

Re: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v2]

2023-09-25 Thread Sandhya Viswanathan
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using >> AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:Use

Re: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v7]

2023-10-10 Thread Sandhya Viswanathan
On Tue, 10 Oct 2023 23:49:18 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using >> AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:Use

Re: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v8]

2023-10-17 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 22:05:08 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using >> AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:Use

Re: RFR: JDK-8314901: AES-GCM interleaved implementation using AVX2 instructions [v8]

2023-10-18 Thread Sandhya Viswanathan
On Wed, 11 Oct 2023 22:05:08 GMT, Smita Kamath wrote: >> Hi All, >> I would like to submit AES-GCM optimization for x86_64 architectures using >> AVX2 instructions. This optimization interleaves AES and GHASH operations. >> >> Below are the performance numbers on my desktop system with -XX:Use

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9]

2024-05-16 Thread Sandhya Viswanathan
On Fri, 10 May 2024 00:19:32 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-17 Thread Sandhya Viswanathan
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256withECDSA1024 256

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v2]

2024-06-14 Thread Sandhya Viswanathan
On Fri, 14 Jun 2024 22:01:44 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains >> (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to >> return partially reduced result (

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Sandhya Viswanathan
On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains >> (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to >> return partially reduced result (

Re: [jdk23] RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-26 Thread Sandhya Viswanathan
itory. > > The commit being backported was authored by Volodymyr Paprotski on 25 Jun > 2024 and was reviewed by Sandhya Viswanathan, Vladimir Kozlov, Ferenc Rakoczi > and Anthony Scarpino. > > Thanks! Marked as reviewed by sviswanathan (Reviewer). - PR Review: https

Re: RFR: 8344144: AES/CBC slow at big payloads [v5]

2024-11-19 Thread Sandhya Viswanathan
On Tue, 19 Nov 2024 17:08:35 GMT, Volodymyr Paprotski wrote: >> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p >> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p >> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench` >> >> Before: >>

Re: RFR: 8344144: AES/CBC slow at big payloads [v4]

2024-11-19 Thread Sandhya Viswanathan
On Tue, 19 Nov 2024 00:24:04 GMT, Volodymyr Paprotski wrote: >> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p >> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p >> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench` >> >> Before: >>

Re: RFR: 8344144: AES/CBC slow at big payloads [v6]

2024-11-19 Thread Sandhya Viswanathan
On Tue, 19 Nov 2024 17:50:23 GMT, Volodymyr Paprotski wrote: >> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p >> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p >> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench` >> >> Before: >>

Re: RFR: 8344144: AES/CBC slow at big payloads [v7]

2024-11-19 Thread Sandhya Viswanathan
On Tue, 19 Nov 2024 17:57:01 GMT, Volodymyr Paprotski wrote: >> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p >> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p >> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench` >> >> Before: >>

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13]

2025-04-04 Thread Sandhya Viswanathan
On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request in

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v13]

2025-04-07 Thread Sandhya Viswanathan
On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request in

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v14]

2025-04-08 Thread Sandhya Viswanathan
On Tue, 8 Apr 2025 21:27:08 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request in

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v14]

2025-04-09 Thread Sandhya Viswanathan
On Wed, 9 Apr 2025 17:09:09 GMT, Ferenc Rakoczi wrote: >> Overall very clean and nicely done PR. Thanks a lot for considering my >> inputs. > >> Overall very clean and nicely done PR. Thanks a lot for considering my >> inputs. > > That is in no small part thanks to the reviewers, especially to

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v12]

2025-04-01 Thread Sandhya Viswanathan
On Mon, 31 Mar 2025 14:40:56 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request i

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64

2025-03-03 Thread Sandhya Viswanathan
On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski wrote: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from > https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). > Thi

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4]

2025-03-05 Thread Sandhya Viswanathan
On Wed, 5 Mar 2025 23:03:23 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over >> from https://github.com/openjdk/jdk/pull/19893 where a quick fix was >> required).

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64

2025-02-27 Thread Sandhya Viswanathan
On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski wrote: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from > https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). > Thi

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64

2025-02-26 Thread Sandhya Viswanathan
On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski wrote: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from > https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). > Thi

Re: RFR: 8353671: Remove dead code missed in JDK-8350459

2025-04-03 Thread Sandhya Viswanathan
On Thu, 3 Apr 2025 18:42:35 GMT, Volodymyr Paprotski wrote: > 8353671: Remove dead code missed in JDK-8350459 Marked as reviewed by sviswanathan (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/24423#pullrequestreview-2741373475

Re: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v4]

2025-05-13 Thread Sandhya Viswanathan
On Mon, 12 May 2025 09:05:10 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be >> approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally

Re: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v4]

2025-05-14 Thread Sandhya Viswanathan
On Wed, 14 May 2025 11:41:30 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 696: >> >>> 694: address generate_kyberAddPoly_2_avx512(StubGenerator *stubgen, >>> 695:MacroAssembler *_masm) { >>> 696: >> >> The Java co

Re: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v6]

2025-05-20 Thread Sandhya Viswanathan
On Tue, 20 May 2025 11:51:49 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 250: >> >>> 248: static void montmul(int outputRegs[], int inputRegs1[], int >>> inputRegs2[], >>> 249: int scratchRegs1[], int scratchRegs2[], MacroAssembler >>> *_m

Re: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v7]

2025-05-20 Thread Sandhya Viswanathan
On Tue, 20 May 2025 17:49:14 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be >> approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally

Re: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v6]

2025-05-20 Thread Sandhya Viswanathan
On Thu, 15 May 2025 13:33:42 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be >> approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally

Re: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v6]

2025-05-15 Thread Sandhya Viswanathan
On Thu, 15 May 2025 13:33:42 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be >> approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally

Re: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v5]

2025-05-15 Thread Sandhya Viswanathan
On Thu, 15 May 2025 00:36:26 GMT, Sandhya Viswanathan wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Responding to comments by Sandhya. > > Another minor comment. Rest of the PR

Re: RFR: 8351412: Add AVX-512 intrinsics for ML-KEM [v5]

2025-05-14 Thread Sandhya Viswanathan
On Wed, 14 May 2025 11:49:11 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be >> approximately doubled. > > Ferenc Rakoczi has updated the pull request incrementally