Integrated: 8353671: Remove dead code missed in JDK-8350459

2025-04-10 Thread Volodymyr Paprotski
On Thu, 3 Apr 2025 18:42:35 GMT, Volodymyr Paprotski wrote: > 8353671: Remove dead code missed in JDK-8350459 This pull request has now been integrated. Changeset: 885cf0ff Author: Volodymyr Paprotski Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/com

Re: RFR: 8353671: Remove dead code missed in JDK-8350459

2025-04-10 Thread Volodymyr Paprotski
On Mon, 7 Apr 2025 14:32:26 GMT, Sean Mullan wrote: > Also, the JBS issue needs an appropriate `noreg` label. Added `noreg-cleanup`, I think thats the best match (?) > Can you add a link to JDK-8350459 in the JBS issue? Its already a subtask of JDK-8350459, so its 'linked' in a way (though the

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4]

2025-04-05 Thread Volodymyr Paprotski
On Wed, 19 Mar 2025 19:00:37 GMT, Anthony Scarpino wrote: >> I was mostly attempting to test 'random paths' through the code, and this >> was a way to pseudo-randomly accomplish that. (i.e. a product of a >> difference, a product of a product.. and so on..) >> >> Since this is looping, we got

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11]

2025-04-05 Thread Volodymyr Paprotski
On Sat, 22 Mar 2025 20:02:31 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request i

Re: RFR: 8353671: Remove dead code missed in JDK-8350459

2025-04-04 Thread Volodymyr Paprotski
On Thu, 3 Apr 2025 18:42:35 GMT, Volodymyr Paprotski wrote: > 8353671: Remove dead code missed in JDK-8350459 @ascarpino If you wouldn't mind, should be a quick one :) - PR Comment: https://git.openjdk.org/jdk/pull/24423#issuecomment-2779414600

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8]

2025-04-04 Thread Volodymyr Paprotski
On Fri, 4 Apr 2025 14:26:30 GMT, Sean Mullan wrote: > > Done I think: https://bugs.openjdk.org/browse/JDK-8297970 > > Is this link correct? This issue was fixed in JDK 20. Sorry.. copy/paste didnt notice.. https://bugs.openjdk.org/browse/JDK-8353670 (also ends in *70!) - PR Comme

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8]

2025-04-03 Thread Volodymyr Paprotski
On Fri, 28 Mar 2025 20:10:42 GMT, Volodymyr Paprotski wrote: >> src/java.base/share/classes/sun/security/util/math/intpoly/MontgomeryIntegerPolynomialP256.java >> line 164: >> >>> 162: protected void mult(long[] a, long[] b, long[] r) { >>> 163

RFR: 8353671: Remove dead code missed in JDK-8350459

2025-04-03 Thread Volodymyr Paprotski
8353671: Remove dead code missed in JDK-8350459 - Commit messages: - remove dead code Changes: https://git.openjdk.org/jdk/pull/24423/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24423&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353671 Stats: 23 lines in 1 f

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8]

2025-04-03 Thread Volodymyr Paprotski
On Mon, 31 Mar 2025 19:57:59 GMT, Sean Mullan wrote: > > > I think it would also be useful to write a release note describing the > > > approximate performance improvement gains for the crypto algorithms as > > > displayed in your chart. Thanks. > > > > > > @seanjmullan I think I only done th

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v12]

2025-04-01 Thread Volodymyr Paprotski
On Mon, 31 Mar 2025 14:40:56 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request i

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8]

2025-03-28 Thread Volodymyr Paprotski
On Fri, 28 Mar 2025 14:39:23 GMT, Sean Mullan wrote: > I think it would also be useful to write a release note describing the > approximate performance improvement gains for the crypto algorithms as > displayed in your chart. Thanks. @seanjmullan I think I only done that once, cant find the 'i

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8]

2025-03-28 Thread Volodymyr Paprotski
On Fri, 28 Mar 2025 18:20:31 GMT, Andrey Turbanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Fix copyright stmt > > src/java.base/share/classes/

Integrated: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64

2025-03-28 Thread Volodymyr Paprotski
On Thu, 20 Feb 2025 21:49:42 GMT, Volodymyr Paprotski wrote: > Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) > > Also add reduction to existing AVX512 multiplication (this was left-over from > https://github.com/openjdk/jdk/pull/19893 where a quick fix

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8]

2025-03-28 Thread Volodymyr Paprotski
On Thu, 27 Mar 2025 19:13:59 GMT, Volodymyr Paprotski wrote: >> Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) >> >> Also add reduction to existing AVX512 multiplication (this was left-over >> from https://github.com/openjdk/jdk/pull/19893 where a qu

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v7]

2025-03-27 Thread Volodymyr Paprotski
On Thu, 27 Mar 2025 18:11:32 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with two >> additional commits since the last revision: >> >> - whitespace >> - prettify test > > Wait on integration. I need to c

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v8]

2025-03-27 Thread Volodymyr Paprotski
gth) > (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt 409621.950 ± 27.260 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt 408975.654 ±

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v7]

2025-03-27 Thread Volodymyr Paprotski
On Thu, 27 Mar 2025 18:52:53 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with two >> additional commits since the last revision: >> >> - whitespace >> - prettify test > > src/java.base/share/cl

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v10]

2025-03-24 Thread Volodymyr Paprotski
On Sat, 22 Mar 2025 16:45:31 GMT, Volodymyr Paprotski wrote: >> Ferenc Rakoczi has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Fix windows build > > src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cp

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4]

2025-03-24 Thread Volodymyr Paprotski
On Thu, 20 Mar 2025 17:34:53 GMT, Anthony Scarpino wrote: >> I used it this testcase for development (and figured I should also check it >> in..) so what might be 'obvious' to me, might not be for anyone else? >> >> Typically, when a test failed, I grabbed the SEED from the test output, >> re

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v7]

2025-03-24 Thread Volodymyr Paprotski
gth) > (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt 409621.950 ± 27.260 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt 408975.654 ±

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11]

2025-03-24 Thread Volodymyr Paprotski
On Sat, 22 Mar 2025 20:02:31 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request i

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11]

2025-03-24 Thread Volodymyr Paprotski
On Sat, 22 Mar 2025 20:38:19 GMT, Volodymyr Paprotski wrote: >> Ferenc Rakoczi has updated the pull request incrementally with two >> additional commits since the last revision: >> >> - Further readability improvements. >> - Added asserts for array

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v7]

2025-03-22 Thread Volodymyr Paprotski
On Thu, 20 Mar 2025 21:06:30 GMT, Ferenc Rakoczi wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_dilithium.cpp line 58: >> >>> 56: >>> 57: ATTRIBUTE_ALIGNED(64) static const uint32_t dilithiumAvx512Perms[] = { >>> 58: // collect montmul results into the destination register >> >> same

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v11]

2025-03-22 Thread Volodymyr Paprotski
On Sat, 22 Mar 2025 20:02:31 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request i

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v10]

2025-03-22 Thread Volodymyr Paprotski
On Thu, 20 Mar 2025 20:37:25 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request i

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v6]

2025-03-18 Thread Volodymyr Paprotski
gth) > (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt 409621.950 ± 27.260 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt 408975.654 ±

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4]

2025-03-17 Thread Volodymyr Paprotski
On Mon, 10 Mar 2025 23:07:45 GMT, Volodymyr Paprotski wrote: >> test/jdk/com/sun/security/util/math/intpoly/MontgomeryPolynomialFuzzTest.java >> line 30: >> >>> 28: import sun.security.util.math.intpoly.*; >>> 29: >>> 30: /* >> >> It

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v5]

2025-03-17 Thread Volodymyr Paprotski
gth) > (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt 409621.950 ± 27.260 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt 408975.654 ±

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v5]

2025-03-17 Thread Volodymyr Paprotski
On Thu, 6 Mar 2025 17:37:33 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request in

Re: RFR: 8351034: Add AVX-512 intrinsics for ML-DSA [v7]

2025-03-17 Thread Volodymyr Paprotski
On Wed, 12 Mar 2025 19:19:08 GMT, Ferenc Rakoczi wrote: >> By using the AVX-512 vector registers the speed of the computation of the >> ML-DSA algorithms (key generation, document signing, signature verification) >> can be approximately doubled. > > Ferenc Rakoczi has updated the pull request i

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4]

2025-03-10 Thread Volodymyr Paprotski
On Mon, 10 Mar 2025 22:49:06 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> more comment improvements > > test/jdk/com/

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v4]

2025-03-05 Thread Volodymyr Paprotski
gth) > (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt 409621.950 ± 27.260 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt 408975.654 ±

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v3]

2025-03-04 Thread Volodymyr Paprotski
gth) > (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt 409621.950 ± 27.260 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt 408975.654 ±

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v2]

2025-03-04 Thread Volodymyr Paprotski
On Thu, 27 Feb 2025 19:05:50 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> comments from Sandhya > > src/hotspot/cpu/x86/stubGenerator_x86_64_pol

Re: RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64 [v2]

2025-03-04 Thread Volodymyr Paprotski
gth) > (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.signSHA256withECDSA1024 256 > thrpt 409621.950 ± 27.260 ops/s > SignatureBench.ECDSA.signSHA256withECDSA 16384 256 > thrpt 408975.654 ±

RFR: 8350459: MontgomeryIntegerPolynomialP256 multiply intrinsic with AVX2 on x86_64

2025-02-21 Thread Volodymyr Paprotski
Add AVX2 montgomery multiplication intrinsic. (About 60-80% gain) Also add reduction to existing AVX512 multiplication (this was left-over from https://github.com/openjdk/jdk/pull/19893 where a quick fix was required). This is mostly for cleanup, but there is about 1-2% gain. Before (no AVX512)

Re: RFR: 8344766: AES/CTR slow at big payloads [v2]

2024-11-27 Thread Volodymyr Paprotski
On Wed, 27 Nov 2024 10:59:06 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix date > > src/java.base/share/classes/com/sun/crypto/provider/Counte

Integrated: 8344766: AES/CTR slow at big payloads

2024-11-27 Thread Volodymyr Paprotski
On Thu, 21 Nov 2024 18:03:44 GMT, Volodymyr Paprotski wrote: > This is a follow up to https://github.com/openjdk/jdk/pull/22086 for AES/CTR > > Before: > > Benchmark(algorithm) (dataSize) (keyLength) (provider) > Mode CntScoreError Units

Re: RFR: 8344766: AES/CTR slow at big payloads [v2]

2024-11-27 Thread Volodymyr Paprotski
On Tue, 26 Nov 2024 15:19:25 GMT, Volodymyr Paprotski wrote: >> This is a follow up to https://github.com/openjdk/jdk/pull/22086 for AES/CTR >> >> Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) (provider) >>

Re: RFR: 8344766: AES/CTR slow at big payloads [v2]

2024-11-27 Thread Volodymyr Paprotski
On Wed, 27 Nov 2024 15:10:09 GMT, Jatin Bhateja wrote: >> Agree with @theRealAph , loop induces safe point on back edges which gives >> opportunity to gc epochs. > >> As Andrew points out, giving an intrinsic lots of data, 'backdoors/breaks' a >> lot of existing algorithms.. from GC not happen

Re: RFR: 8344766: AES/CTR slow at big payloads [v2]

2024-11-27 Thread Volodymyr Paprotski
On Wed, 27 Nov 2024 14:45:36 GMT, Andrew Haley wrote: >> For CRC32 digest computation we do support intrinsic at interpreter and c1 >> compiler level to overcome such warmup related penalties. > > This is not just a good idea to trigger OSR and therefore use the intrinsic, > it's a good idea be

Re: RFR: 8344766: AES/CTR slow at big payloads [v2]

2024-11-26 Thread Volodymyr Paprotski
; thrpt3 218.882 ± 2.446 ops/s > AESBench.encrypt2 AES/CTR/NoPadding3000 128 SunJCE > thrpt3 425.402 ± 4.205 ops/s Volodymyr Paprotski has updated the pull request incrementally with one additional commit since the last revision: fix date ---

Integrated: 8344144: AES/CBC slow at big payloads

2024-11-21 Thread Volodymyr Paprotski
On Wed, 13 Nov 2024 21:14:58 GMT, Volodymyr Paprotski wrote: > Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p > algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p > keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench` &g

RFR: 8344766: AES/CTR slow at big payloads

2024-11-21 Thread Volodymyr Paprotski
This is a follow up to https://github.com/openjdk/jdk/pull/22086 for AES/CTR Before: Benchmark(algorithm) (dataSize) (keyLength) (provider) Mode CntScoreError Units AESBench.decrypt AES/CTR/NoPadding3000 128 SunJCE thrpt3 16.491 ± 0

Re: RFR: 8344144: AES/CBC slow at big payloads [v7]

2024-11-21 Thread Volodymyr Paprotski
On Tue, 19 Nov 2024 18:01:42 GMT, Volodymyr Paprotski wrote: >> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p >> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p >> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBe

Re: RFR: 8344144: AES/CBC slow at big payloads [v7]

2024-11-20 Thread Volodymyr Paprotski
On Tue, 19 Nov 2024 18:01:42 GMT, Volodymyr Paprotski wrote: >> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p >> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p >> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBe

Re: RFR: 8344144: AES/CBC slow at big payloads [v6]

2024-11-19 Thread Volodymyr Paprotski
I have not deterministically proven why chunking works: before the change, > the CBC intrinsic is not being used; and after chunking, it is. There is > quite a bit of GC activity in the default AESBench, so `encrypt2/decrypt2` > versions isolate just crypto (see comment below). Volody

Re: RFR: 8344144: AES/CBC slow at big payloads [v7]

2024-11-19 Thread Volodymyr Paprotski
I have not deterministically proven why chunking works: before the change, > the CBC intrinsic is not being used; and after chunking, it is. There is > quite a bit of GC activity in the default AESBench, so `encrypt2/decrypt2` > versions isolate just crypto (see comment below). Volody

Re: RFR: 8344144: AES/CBC slow at big payloads [v5]

2024-11-19 Thread Volodymyr Paprotski
I have not deterministically proven why chunking works: before the change, > the CBC intrinsic is not being used; and after chunking, it is. There is > quite a bit of GC activity in the default AESBench, so `encrypt2/decrypt2` > versions isolate just crypto (see comment below

Re: RFR: 8344144: AES/CBC slow at big payloads [v2]

2024-11-19 Thread Volodymyr Paprotski
On Fri, 15 Nov 2024 19:43:18 GMT, Artur Barashev wrote: >> I don't think this constant needs to be dynamic. The reason I mention >> blocksize, the intrinsic expects multiple of block size >> (`ArrayUtil.blockSizeCheck(plainLen, blockSize);` assert before-hand), but >> its otherwise unrelated t

Re: RFR: 8344144: AES/CBC slow at big payloads [v2]

2024-11-19 Thread Volodymyr Paprotski
On Fri, 15 Nov 2024 18:27:25 GMT, Volodymyr Paprotski wrote: > Please include the benchmarking tests in this PR. Done There are some CI failures on mac in 'GetStackTraceALotWhen*' tests that seem unrelated. - PR Comment: https://git.openjdk.org/jdk/pull/22086

Re: RFR: 8344144: AES/CBC slow at big payloads [v2]

2024-11-18 Thread Volodymyr Paprotski
On Tue, 19 Nov 2024 00:06:19 GMT, Anthony Scarpino wrote: >> But it takes a few calls before hotspot switches to the intrinsic, so it >> can't be too large. I think we should include this logic explanation (the >> intrinsic parallelizes decryption) in the comment to make it clear what we >> a

Re: RFR: 8344144: AES/CBC slow at big payloads [v4]

2024-11-18 Thread Volodymyr Paprotski
I have not deterministically proven why chunking works: before the change, > the CBC intrinsic is not being used; and after chunking, it is. There is > quite a bit of GC activity in the default AESBench, so `encrypt2/decrypt2` > versions isolate just crypto (see comment below). Volody

Re: RFR: 8344144: AES/CBC slow at big payloads [v3]

2024-11-18 Thread Volodymyr Paprotski
I have not deterministically proven why chunking works: before the change, > the CBC intrinsic is not being used; and after chunking, it is. There is > quite a bit of GC activity in the default AESBench, so `encrypt2/decrypt2` > versions isolate just crypto (see comment below). Volody

Re: RFR: 8344144: AES/CBC slow at big payloads [v2]

2024-11-15 Thread Volodymyr Paprotski
On Thu, 14 Nov 2024 16:20:22 GMT, Artur Barashev wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> comments from Kevin > > src/java.base/share/classes/com/sun/crypto/provider/Ci

Re: RFR: 8344144: AES/CBC slow at big payloads [v2]

2024-11-15 Thread Volodymyr Paprotski
On Fri, 15 Nov 2024 12:52:28 GMT, Artur Barashev wrote: >> I don't think it matters either way performance-wise, or from any other >> point of view in this case, but as a rule of thumb, I think for >> readability/maintainability it is worth to give up a bit of code size >> (especially if that

Re: RFR: 8344144: AES/CBC slow at big payloads [v2]

2024-11-15 Thread Volodymyr Paprotski
On Thu, 14 Nov 2024 00:44:35 GMT, Volodymyr Paprotski wrote: >> Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p >> algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p >> keyLength=128 org.openjdk.bench.javax.crypto.full.AESBe

Re: RFR: 8344144: AES/CBC slow at big payloads [v2]

2024-11-15 Thread Volodymyr Paprotski
On Thu, 14 Nov 2024 17:26:13 GMT, Artur Barashev wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> comments from Kevin > > src/java.base/share/classes/com/sun/crypto/provider/Ci

RFR: 8344144: AES/CBC slow at big payloads

2024-11-13 Thread Volodymyr Paprotski
Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench` Before: Benchmark(algorithm) (dataSize) (keyLength) (provider) Mode

Re: RFR: 8344144: AES/CBC slow at big payloads

2024-11-13 Thread Volodymyr Paprotski
On Wed, 13 Nov 2024 21:14:58 GMT, Volodymyr Paprotski wrote: > Measuring throughput with JMH parameters `-f 1 -i 2 -wi 3 -r 20 -w 30 -p > algorithm=AES/CBC/NoPadding -p dataSize=3000 -p provider=SunJCE -p > keyLength=128 org.openjdk.bench.javax.crypto.full.AESBench` &g

Re: RFR: 8344144: AES/CBC slow at big payloads [v2]

2024-11-13 Thread Volodymyr Paprotski
On Wed, 13 Nov 2024 21:34:34 GMT, Kevin Driver wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> comments from Kevin > > src/java.base/share/classes/com/sun/crypto/provider/Ci

Re: RFR: 8344144: AES/CBC slow at big payloads [v2]

2024-11-13 Thread Volodymyr Paprotski
I have not deterministically proven why chunking works: before the change, > the CBC intrinsic is not being used; and after chunking, it is. There is > quite a bit of GC activity in the default AESBench, so `encrypt2/decrypt2` > versions isolate just crypto (see comment below). Volody

[jdk23] Integrated: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-26 Thread Volodymyr Paprotski
On Tue, 25 Jun 2024 23:50:20 GMT, Volodymyr Paprotski wrote: > Hi all, > > This pull request contains a backport of commit > [f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522) > from the [openjdk/jdk](https://git.openjdk.org/jdk) repos

[jdk23] RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-25 Thread Volodymyr Paprotski
Hi all, This pull request contains a backport of commit [f101e153](https://github.com/openjdk/jdk/commit/f101e153cee68750fcf1f12da10e29806875b522) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. Thanks! - Commit messages: - Backport f101e153cee68750fcf1f12da10e298

Integrated: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-25 Thread Volodymyr Paprotski
On Fri, 14 Jun 2024 20:23:04 GMT, Volodymyr Paprotski wrote: > This fix recovers XDH performance but removes some of the P256 gains > (~-8-14%). Still faster, but not as much. > > The fix is to undo 'int' return type on mult()/square(), which allowed to > return part

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-25 Thread Volodymyr Paprotski
On Tue, 25 Jun 2024 17:31:09 GMT, Ferenc Rakoczi wrote: >> Hi @vpaprotsk, >> @ferakocz is going to take a look at the change. When he says it's ok, I'll >> approve the PR. > > @ascarpino please approve this change. Thanks @ferakocz @ascarpino - PR Comment: https://git.openjdk.or

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-24 Thread Volodymyr Paprotski
On Mon, 24 Jun 2024 14:48:43 GMT, Ferenc Rakoczi wrote: >> @ferakocz just tagging you as reminder of (the many) items in your queue :) >> Thanks! > >> @ferakocz just tagging you as reminder of (the many) items in your queue :) >> Thanks! > > Sorry, I was out of office last week. I will take a

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-20 Thread Volodymyr Paprotski
On Mon, 17 Jun 2024 16:38:55 GMT, Volodymyr Paprotski wrote: >> This fix recovers XDH performance but removes some of the P256 gains >> (~-8-14%). Still faster, but not as much. >> >> The fix is to undo 'int' return type on mult()/square(), which allowed to

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-18 Thread Volodymyr Paprotski
On Tue, 18 Jun 2024 15:10:37 GMT, Vladimir Kozlov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> comment from Sandhya > > @TobiHartmann ran our testing and it passed.

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Volodymyr Paprotski
On Mon, 17 Jun 2024 23:29:18 GMT, Vladimir Kozlov wrote: > Talking about future improvements. Is it possible to optimize reduction code > by converting it to intrinsic too? Or code generated by C2 is good enough? I had some experiments to try where I was using virtual methods to add optimizati

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Volodymyr Paprotski
On Mon, 17 Jun 2024 21:21:01 GMT, Vladimir Kozlov wrote: > Let me know that I got it right: > > * The reduction operation was optional and P256 benefitted by not executing > it. > * Previous `mult()` **Java** code always retuned 0 because it executes > reduction so callers do not need to do it

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Volodymyr Paprotski
On Mon, 17 Jun 2024 19:22:01 GMT, Vladimir Kozlov wrote: > Looking on `MontgomeryIntegerPolynomialP256.java` the code in `multImpl() + > reducePositive()` is similar to original `mult()` except new additional code > at the end of `multImpl()`. Yep, I split the original java mult() into multIm

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Volodymyr Paprotski
On Mon, 17 Jun 2024 18:12:16 GMT, Vladimir Kozlov wrote: > What causes regression in P256 "(~-8-14%)"? From what I see, you re-arranged > code to not execute some code ("reducePositive()") when it is not needed. How > this affects P256? Actually, the other way around; reducePositive is now an

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v2]

2024-06-17 Thread Volodymyr Paprotski
On Fri, 14 Jun 2024 23:39:54 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Improve non-intrinsic p256 performance > > src/hotspot/share/opto/run

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v3]

2024-06-17 Thread Volodymyr Paprotski
256 > EC thrpt3 1350.745 ± 28.514 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1349.393 ± 32.050 ops/s > > Performance in master without mult() intrins

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538 [v2]

2024-06-14 Thread Volodymyr Paprotski
256 > EC thrpt3 1350.745 ± 28.514 ops/s > o.o.b.j.c.small.KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt3 1349.393 ± 32.050 ops/s > > Performance in master without mult() intrins

Re: RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-14 Thread Volodymyr Paprotski
On Fri, 14 Jun 2024 20:23:04 GMT, Volodymyr Paprotski wrote: > This fix recovers XDH performance but removes some of the P256 gains > (~-8-14%). Still faster, but not as much. > > The fix is to undo 'int' return type on mult()/square(), which allowed to > return part

RFR: 8333583: Crypto-XDH.generateSecret regression after JDK-8329538

2024-06-14 Thread Volodymyr Paprotski
This fix recovers XDH performance but removes some of the P256 gains (~-8-14%). Still faster, but not as much. The fix is to undo 'int' return type on mult()/square(), which allowed to return partially reduced result (i.e. this avoids extra reductions when mult() result is fed into addition). T

Integrated: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic

2024-05-22 Thread Volodymyr Paprotski
On Tue, 2 Apr 2024 15:42:05 GMT, Volodymyr Paprotski wrote: > Performance. Before: > > Benchmark(algorithm) (dataSize) (keyLength) > (provider) Mode Cnt ScoreError Units > SignatureBench.ECDSA.signSHA256withECDSA10

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12]

2024-05-22 Thread Volodymyr Paprotski
On Tue, 21 May 2024 17:41:46 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256with

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-21 Thread Volodymyr Paprotski
On Tue, 21 May 2024 07:21:14 GMT, Tobias Hartmann wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> shenandoah verifier > > I'm getting some conflicts when trying to apply

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v12]

2024-05-21 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574 ± >

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-17 Thread Volodymyr Paprotski
On Fri, 17 May 2024 21:16:47 GMT, Volodymyr Paprotski wrote: >> Performance. Before: >> >> Benchmark(algorithm) (dataSize) (keyLength) >> (provider) Mode Cnt ScoreError Units >> SignatureBench.ECDSA.signSHA256with

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v11]

2024-05-17 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9]

2024-05-17 Thread Volodymyr Paprotski
On Thu, 16 May 2024 23:21:36 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> whitespace > > src/hotspot/cpu/x86/stubGenerator_x86_64_pol

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v10]

2024-05-17 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v8]

2024-05-09 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v9]

2024-05-09 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v7]

2024-05-09 Thread Volodymyr Paprotski
On Thu, 9 May 2024 23:36:03 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> whitespace > > src/java.base/share/classes/sun/security/ec/ECOpera

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v7]

2024-05-09 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v6]

2024-05-06 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v5]

2024-04-25 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-24 Thread Volodymyr Paprotski
On Tue, 9 Apr 2024 02:01:36 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> remove use of jdk.crypto.ec > > src/java.base/share/classes/sun/security

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v3]

2024-04-24 Thread Volodymyr Paprotski
On Tue, 23 Apr 2024 19:55:57 GMT, Anthony Scarpino wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Comments from Jatin and Tony > > src/java.base/share/classes/sun/security

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-24 Thread Volodymyr Paprotski
On Tue, 16 Apr 2024 02:26:57 GMT, Jatin Bhateja wrote: >> Per-above, this is a switch statement (`UNLIKELY`) fallback. I can still add >> alignment and loop rotation, but being a fallback figured its more important >> to keep it small&readable... > > It's all part of intrinsic, no harm in polis

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v4]

2024-04-24 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-15 Thread Volodymyr Paprotski
On Fri, 5 Apr 2024 07:19:28 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> remove use of jdk.crypto.ec > > src/hotspot/cpu/x86/stubGenerator_x86_64_p

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-15 Thread Volodymyr Paprotski
On Thu, 11 Apr 2024 17:15:21 GMT, Anthony Scarpino wrote: >>> In `ECOperations.java`, if I understand this correctly, it is to replace >>> the existing `PointMultiplier` with montgomery-based PointMuliplier. But >>> when I look at the code, I see both are still options. If I read this >>> cor

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v2]

2024-04-15 Thread Volodymyr Paprotski
On Wed, 10 Apr 2024 23:56:52 GMT, Volodymyr Paprotski wrote: > Few early comments. > > Please update the copyright year of all the modified files. > > You can even consider splitting this into two patches, Java side changes in > one and x86 optimized intrinsic in ne

Re: RFR: 8329538: Accelerate P256 on x86_64 using Montgomery intrinsic [v3]

2024-04-15 Thread Volodymyr Paprotski
ntBench.EC.generateSecret ECDH 256 > EC thrpt3 1346.523 ± 28.722 ops/s > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt3 1919.574

  1   2   >