> An aarch64 implementation of the `MontgomeryIntegerPolynomial256.mult()` > method and `IntegerPolynomial.conditionalAssign()`. Since 64-bit > multiplication is not supported on Neon and manually performing this > operation with 32-bit limbs is slower than with GPRs, a hybrid neon/gpr > approach is used. Neon instructions are used to compute intermediate values > used in the last two iterations of the main "loop", while the GPRs compute > the first few iterations. At the method level this improves performance by > ~9% and at the API level roughly 5%. > > Performance no intrinsic (Apple M1): > > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt 8 2427.562 ± > 24.923 ops/s > PolynomialP256Bench.benchMultiply false thrpt 8 1757.495 ± > 41.805 ops/s > PolynomialP256Bench.benchSquare true thrpt 8 2435.202 ± > 20.822 ops/s > PolynomialP256Bench.benchSquare false thrpt 8 2420.390 ± > 33.594 ops/s > > Benchmark (algorithm) (dataSize) (keyLength) > (provider) Mode Cnt Score Error Units > SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 > thrpt 40 8439.881 ± 29.838 ops/s > SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 > thrpt 40 7990.614 ± 30.998 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 1024 256 > thrpt 40 2677.737 ± 8.400 ops/s > SignatureBench.ECDSA.verify SHA256withECDSA 16384 256 > thrpt 40 2619.297 ± 9.737 ops/s > > Benchmark (algorithm) (keyLength) > (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt 40 1905.369 ± 3.745 ops/s > > Benchmark (algorithm) (keyLength) > (kpgAlgorithm) (provider) Mode Cnt Score Error Units > KeyAgreementBench.EC.generateSecret ECDH 256 > EC thrpt 40 1903.997 ± 4.092 ops/s > > > Performance with intrinsic (Apple M1): > > Benchmark (isMontBench) Mode Cnt Score > Error Units > PolynomialP256Bench.benchMultiply true thrpt 8 2676.599 ± > 24.722 ops/s > PolynomialP256Bench.benchMultiply false thrpt 8 1770.589 ± > 2.584 ops/s > PolynomialP256Bench.benchSqua...
Ben Perez has updated the pull request incrementally with one additional commit since the last revision: added comments to p256 intrinsics, fixed error message in umullv instruction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/27946/files - new: https://git.openjdk.org/jdk/pull/27946/files/05925eaa..e70dc14e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=27946&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=27946&range=05-06 Stats: 17 lines in 2 files changed: 11 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/27946.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/27946/head:pull/27946 PR: https://git.openjdk.org/jdk/pull/27946
