On Wed, 5 Oct 2022 21:28:26 GMT, vpaprotsk wrote:
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
> message blocks at a time. For more details, left a lot of comments in
> `macroAssembler_x86_poly.cpp`.
>
> - Added new KAT test for Poly1305 and a fuzz test to compare
On Fri, 21 Oct 2022 20:20:58 GMT, vpaprotsk wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KAT test for Poly1305 and a fuzz test to co
On Fri, 21 Oct 2022 20:20:58 GMT, vpaprotsk wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KAT test for Poly1305 and a fuzz test to co
On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KAT test for Poly1305 and a fuzz test to co
On Mon, 24 Oct 2022 22:09:29 GMT, vpaprotsk wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KAT test for Poly1305 and a fuzz test to co
On Fri, 4 Nov 2022 20:59:10 GMT, Volodymyr Paprotski wrote:
>> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175:
>>
>>> 173: // Choice of 1024 is arbitrary, need enough data blocks to
>>> amortize conversion overhead
>>> 174: // and not affect p
On Fri, 4 Mar 2022 16:47:54 GMT, Jamil Nimeh wrote:
> This PR delivers ChaCha20 intrinsics that accelerate the core block function
> that generates key stream from the key, counter and nonce. Intrinsics have
> been written for the following platforms and instruction sets:
>
> - x86_64: AVX, A
On Thu, 10 Nov 2022 20:12:30 GMT, Jamil Nimeh wrote:
>> Jamil Nimeh has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> replace hi/lo word shuffles and left-right shift/or operations for vpshufd
>> on byte-aligned rotations
>
> using vpshuf
On Thu, 10 Nov 2022 01:22:04 GMT, Volodymyr Paprotski wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KAT test for Poly1305 and a fuzz
On Fri, 11 Nov 2022 01:14:05 GMT, Volodymyr Paprotski wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KAT test for Poly1305 and a fuzz
On Fri, 11 Nov 2022 01:14:05 GMT, Volodymyr Paprotski wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KAT test for Poly1305 and a fuzz
On Thu, 10 Nov 2022 20:11:46 GMT, Jamil Nimeh wrote:
>> This PR delivers ChaCha20 intrinsics that accelerate the core block function
>> that generates key stream from the key, counter and nonce. Intrinsics have
>> been written for the following platforms and instruction sets:
>>
>> - x86_64:
On Tue, 15 Nov 2022 00:10:35 GMT, Vladimir Ivanov wrote:
>> Volodymyr Paprotski has updated the pull request with a new target base due
>> to a merge or a rebase. The pull request now contains 23 commits:
>>
>> - Merge remote-tracking branch 'origin/master' into avx512-poly
>> - Vladimir's re
On Wed, 16 Nov 2022 20:52:14 GMT, Volodymyr Paprotski wrote:
>> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16
>> message blocks at a time. For more details, left a lot of comments in
>> `macroAssembler_x86_poly.cpp`.
>>
>> - Added new KAT test for Poly1305 and a fuzz
On Thu, 10 Nov 2022 20:11:46 GMT, Jamil Nimeh wrote:
>> This PR delivers ChaCha20 intrinsics that accelerate the core block function
>> that generates key stream from the key, counter and nonce. Intrinsics have
>> been written for the following platforms and instruction sets:
>>
>> - x86_64:
On Wed, 23 Nov 2022 23:33:32 GMT, Volodymyr Paprotski wrote:
> Regarding mainline:
> - I decided not to 'unroll' the top while loop (i.e. `engineUpdate(byte[]
> input, int offset, int len)` is unrolled)
>- It is debatable which version is easier to understand. If this version
> is 'too comp
On Thu, 1 Dec 2022 18:23:45 GMT, Volodymyr Paprotski wrote:
>> There is now an intrinsic for Poly1305, which is only enabled on the
>> `engineUpdate([]byte)` path. This PR adds intrinsic support
>> `engineUpdate(ByteBuffer)` (when the bytebuffer `hasArray`).
>>
>> Fuzzing test expanded to also
On Thu, 1 Dec 2022 18:23:45 GMT, Volodymyr Paprotski wrote:
>> There is now an intrinsic for Poly1305, which is only enabled on the
>> `engineUpdate([]byte)` path. This PR adds intrinsic support
>> `engineUpdate(ByteBuffer)` (when the bytebuffer `hasArray`).
>>
>> Fuzzing test expanded to also
On Thu, 1 Dec 2022 18:23:45 GMT, Volodymyr Paprotski wrote:
>> There is now an intrinsic for Poly1305, which is only enabled on the
>> `engineUpdate([]byte)` path. This PR adds intrinsic support
>> `engineUpdate(ByteBuffer)` (when the bytebuffer `hasArray`).
>>
>> Fuzzing test expanded to also
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote:
> In addition to the issue
> [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing
> the scope from benchmark to thread for below benchmark files having shared
> state, also which fixes few of the benchmarks scalabili
On Thu, 10 Aug 2023 15:30:19 GMT, Swati Sharma wrote:
> In addition to the issue
> [JDK-8311178](https://bugs.openjdk.org/browse/JDK-8311178), logically fixing
> the scope from benchmark to thread for below benchmark files having shared
> state, also which fixes few of the benchmarks scalabili
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote:
>> Hi All,
>> I would like to submit AES-GCM optimization for x86_64 architectures using
>> AVX2 instructions. This optimization interleaves AES and GHASH operations.
>>
>> Below are the performance numbers on my desktop system with -XX:Use
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote:
>> Hi All,
>> I would like to submit AES-GCM optimization for x86_64 architectures using
>> AVX2 instructions. This optimization interleaves AES and GHASH operations.
>>
>> Below are the performance numbers on my desktop system with -XX:Use
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote:
>> Hi All,
>> I would like to submit AES-GCM optimization for x86_64 architectures using
>> AVX2 instructions. This optimization interleaves AES and GHASH operations.
>>
>> Below are the performance numbers on my desktop system with -XX:Use
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote:
>> Hi All,
>> I would like to submit AES-GCM optimization for x86_64 architectures using
>> AVX2 instructions. This optimization interleaves AES and GHASH operations.
>>
>> Below are the performance numbers on my desktop system with -XX:Use
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote:
>> Hi All,
>> I would like to submit AES-GCM optimization for x86_64 architectures using
>> AVX2 instructions. This optimization interleaves AES and GHASH operations.
>>
>> Below are the performance numbers on my desktop system with -XX:Use
On Wed, 13 Sep 2023 20:25:22 GMT, Smita Kamath wrote:
>> Hi All,
>> I would like to submit AES-GCM optimization for x86_64 architectures using
>> AVX2 instructions. This optimization interleaves AES and GHASH operations.
>>
>> Below are the performance numbers on my desktop system with -XX:Use
On Tue, 10 Oct 2023 23:49:18 GMT, Smita Kamath wrote:
>> Hi All,
>> I would like to submit AES-GCM optimization for x86_64 architectures using
>> AVX2 instructions. This optimization interleaves AES and GHASH operations.
>>
>> Below are the performance numbers on my desktop system with -XX:Use
On Wed, 11 Oct 2023 22:05:08 GMT, Smita Kamath wrote:
>> Hi All,
>> I would like to submit AES-GCM optimization for x86_64 architectures using
>> AVX2 instructions. This optimization interleaves AES and GHASH operations.
>>
>> Below are the performance numbers on my desktop system with -XX:Use
On Wed, 11 Oct 2023 22:05:08 GMT, Smita Kamath wrote:
>> Hi All,
>> I would like to submit AES-GCM optimization for x86_64 architectures using
>> AVX2 instructions. This optimization interleaves AES and GHASH operations.
>>
>> Below are the performance numbers on my desktop system with -XX:Use
On Fri, 10 May 2024 00:19:32 GMT, Volodymyr Paprotski wrote:
>> Performance. Before:
>>
>> Benchmark(algorithm) (dataSize) (keyLength)
>> (provider) Mode Cnt ScoreError Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024 256
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote:
>> Performance. Before:
>>
>> Benchmark(algorithm) (dataSize) (keyLength)
>> (provider) Mode Cnt ScoreError Units
>> SignatureBench.ECDSA.signSHA256withECDSA1024 256
On Fri, 14 Jun 2024 22:01:44 GMT, Volodymyr Paprotski wrote:
>> This fix recovers XDH performance but removes some of the P256 gains
>> (~-8-14%). Still faster, but not as much.
>>
>> The fix is to undo 'int' return type on mult()/square(), which allowed to
>> return partially reduced result (
On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote:
>> This fix recovers XDH performance but removes some of the P256 gains
>> (~-8-14%). Still faster, but not as much.
>>
>> The fix is to undo 'int' return type on mult()/square(), which allowed to
>> return partially reduced result (
itory.
>
> The commit being backported was authored by Volodymyr Paprotski on 25 Jun
> 2024 and was reviewed by Sandhya Viswanathan, Vladimir Kozlov, Ferenc Rakoczi
> and Anthony Scarpino.
>
> Thanks!
Marked as reviewed by sviswanathan (Reviewer).
-
PR Review: https
On Tue, 19 Nov 2024 17:08:35 GMT, Volodymyr Paprotski
wrote:
>> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p
>> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p
>> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench`
>>
>> Before:
>>
On Tue, 19 Nov 2024 00:24:04 GMT, Volodymyr Paprotski
wrote:
>> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p
>> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p
>> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench`
>>
>> Before:
>>
On Tue, 19 Nov 2024 17:50:23 GMT, Volodymyr Paprotski
wrote:
>> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p
>> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p
>> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench`
>>
>> Before:
>>
On Tue, 19 Nov 2024 17:57:01 GMT, Volodymyr Paprotski
wrote:
>> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p
>> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p
>> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench`
>>
>> Before:
>>
On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi wrote:
>> By using the AVX-512 vector registers the speed of the computation of the
>> ML-DSA algorithms (key generation, document signing, signature verification)
>> can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request in
On Wed, 2 Apr 2025 07:38:34 GMT, Ferenc Rakoczi wrote:
>> By using the AVX-512 vector registers the speed of the computation of the
>> ML-DSA algorithms (key generation, document signing, signature verification)
>> can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request in
On Tue, 8 Apr 2025 21:27:08 GMT, Ferenc Rakoczi wrote:
>> By using the AVX-512 vector registers the speed of the computation of the
>> ML-DSA algorithms (key generation, document signing, signature verification)
>> can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request in
On Wed, 9 Apr 2025 17:09:09 GMT, Ferenc Rakoczi wrote:
>> Overall very clean and nicely done PR. Thanks a lot for considering my
>> inputs.
>
>> Overall very clean and nicely done PR. Thanks a lot for considering my
>> inputs.
>
> That is in no small part thanks to the reviewers, especially to
On Mon, 31 Mar 2025 14:40:56 GMT, Ferenc Rakoczi wrote:
>> By using the AVX-512 vector registers the speed of the computation of the
>> ML-DSA algorithms (key generation, document signing, signature verification)
>> can be approximately doubled.
>
> Ferenc Rakoczi has updated the pull request i
On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski
wrote:
> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain)
>
> Also add reduction to existing AVX512 multiplication (this was left-over from
> https://github.com/openjdk/jdk/pull/19893 where a quick fix was required).
> Thi
On Wed, 5 Mar 2025 23:03:23 GMT, Volodymyr Paprotski
wrote:
>> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain)
>>
>> Also add reduction to existing AVX512 multiplication (this was left-over
>> from https://github.com/openjdk/jdk/pull/19893 where a quick fix was
>> required).
On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski
wrote:
> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain)
>
> Also add reduction to existing AVX512 multiplication (this was left-over from
> https://github.com/openjdk/jdk/pull/19893 where a quick fix was required).
> Thi
On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski
wrote:
> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain)
>
> Also add reduction to existing AVX512 multiplication (this was left-over from
> https://github.com/openjdk/jdk/pull/19893 where a quick fix was required).
> Thi
On Thu, 3 Apr 2025 18:42:35 GMT, Volodymyr Paprotski
wrote:
> 8353671: Remove dead code missed in JDK-8350459
Marked as reviewed by sviswanathan (Reviewer).
-
PR Review: https://git.openjdk.org/jdk/pull/24423#pullrequestreview-2741373475
On Mon, 12 May 2025 09:05:10 GMT, Ferenc Rakoczi wrote:
>> By using the AVX-512 vector registers the speed of the computation of the
>> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be
>> approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally
On Wed, 14 May 2025 11:41:30 GMT, Ferenc Rakoczi wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 696:
>>
>>> 694: address generate_kyberAddPoly_2_avx512(StubGenerator *stubgen,
>>> 695:MacroAssembler *_masm) {
>>> 696:
>>
>> The Java co
On Tue, 20 May 2025 11:51:49 GMT, Ferenc Rakoczi wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_kyber.cpp line 250:
>>
>>> 248: static void montmul(int outputRegs[], int inputRegs1[], int
>>> inputRegs2[],
>>> 249: int scratchRegs1[], int scratchRegs2[], MacroAssembler
>>> *_m
On Tue, 20 May 2025 17:49:14 GMT, Ferenc Rakoczi wrote:
>> By using the AVX-512 vector registers the speed of the computation of the
>> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be
>> approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally
On Thu, 15 May 2025 13:33:42 GMT, Ferenc Rakoczi wrote:
>> By using the AVX-512 vector registers the speed of the computation of the
>> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be
>> approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally
On Thu, 15 May 2025 13:33:42 GMT, Ferenc Rakoczi wrote:
>> By using the AVX-512 vector registers the speed of the computation of the
>> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be
>> approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally
On Thu, 15 May 2025 00:36:26 GMT, Sandhya Viswanathan
wrote:
>> Ferenc Rakoczi has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Responding to comments by Sandhya.
>
> Another minor comment. Rest of the PR
On Wed, 14 May 2025 11:49:11 GMT, Ferenc Rakoczi wrote:
>> By using the AVX-512 vector registers the speed of the computation of the
>> ML-KEM algorithms (key generation, encapsulation, decapsulation) can be
>> approximately doubled.
>
> Ferenc Rakoczi has updated the pull request incrementally
57 matches
Mail list logo