Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v22]

2022-11-22 Thread Volodymyr Paprotski
On Thu, 17 Nov 2022 20:42:27 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-22 Thread Volodymyr Paprotski
On Tue, 22 Nov 2022 15:21:44 GMT, Tobias Hartmann wrote: >> @iwanowww Hope the extra tests passed? (Or do you have to re-run them on the >> latest patch again?) > > I fixed the test issue with > [JDK-8297382](https://bugs.openjdk.org/browse/JDK-8297382) but this also > caused a regression with

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-22 Thread Tobias Hartmann
On Mon, 21 Nov 2022 17:42:28 GMT, Volodymyr Paprotski wrote: >> Overall, looks good. Just one minor cleanup suggestion. >> >> I've submitted the latest patch for testing (hs-tier1 - hs-tier4). > > @iwanowww Hope the extra tests passed? (Or do you have to re-run them on the > latest patch again?

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v22]

2022-11-21 Thread David Holmes
On Thu, 17 Nov 2022 20:42:27 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v22]

2022-11-21 Thread Vladimir Ivanov
On Thu, 17 Nov 2022 20:42:27 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-21 Thread Volodymyr Paprotski
On Thu, 17 Nov 2022 19:32:28 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> vzeroall, no spill, reg re-map > > Overall, looks good. Just one minor cleanup suggestion. > > I've submitte

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-17 Thread Volodymyr Paprotski
On Thu, 17 Nov 2022 19:30:14 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> vzeroall, no spill, reg re-map > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 377: > >> 375: _

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v22]

2022-11-17 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-17 Thread Vladimir Ivanov
On Thu, 17 Nov 2022 03:23:49 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:41:32 GMT, Volodymyr Paprotski wrote: >> Yes, please. And for the upper half of register file, just code it as a loop >> over register range: >> >> for (int rxmm_num = 16; rxmm_num < 30; rxmm_num++) { >> XMMRegister rxmm = as_XMMRegister(rxmm_num); >> __ vpxorq(rxmm,

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:16:14 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 756: >> >>> 754: >>> 755: // Store R^8-R for later use >>> 756: __ evmovdquq(Address(rsp, 64*0), B0, Assembler::AVX_512bit); >> >> Could these vector spills be eliminated?

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-16 Thread Volodymyr Paprotski
On Thu, 17 Nov 2022 03:19:15 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v21]

2022-11-16 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:39:00 GMT, Vladimir Ivanov wrote: >> ah.. I remember thinking about doing that.. `vzeroall` isnt encoded yet and >> I figured since I already have to do the xmm16-29, might as well do them >> all.. should I add that instruction too? > > Yes, please. And for the upper half

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Vladimir Ivanov
On Wed, 16 Nov 2022 23:14:45 GMT, Volodymyr Paprotski wrote: >> Or simply switch to `vzeroall` for `xmm0` - `xmm15`. > > ah.. I remember thinking about doing that.. `vzeroall` isnt encoded yet and I > figured since I already have to do the xmm16-29, might as well do them all.. > should I add th

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:08:16 GMT, Vladimir Ivanov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 917: >> >>> 915: // Cleanup >>> 916: __ vpxorq(xmm0, xmm0, xmm0, Assembler::AVX_512bit); >>> 917: __ vpxorq(xmm1, xmm1, xmm1, Assembler::AVX_512bit); >> >> You could use T0,

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:12:28 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> redo register alloc with explicit func params > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 756:

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Vladimir Ivanov
On Wed, 16 Nov 2022 20:52:14 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Vladimir Ivanov
On Wed, 16 Nov 2022 22:47:37 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> redo register alloc with explicit func params > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Sandhya Viswanathan
On Wed, 16 Nov 2022 20:52:14 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-16 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 01:43:46 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> live review with Sandhya > > Overall, it looks good. @iwanowww Answered your review comments, please take a

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 19:44:16 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 25 commits: >> >> - Vladimir's review comments >> - Merge remote-tracking branch 'origin/master' int

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 23:51:22 GMT, Vladimir Ivanov wrote: >> Added a comment, hopefully less confusing. > > On a second thought, passing derived pointers as arguments doesn't mix well > with safepoint awareness. > (And this stub eventually has to become safepoint aware.) > Deriving a pointer insi

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 19:38:56 GMT, Volodymyr Paprotski wrote: >>> On other hand, there are functions like poly1305_multiply8_avx512 and >>> poly1305_process_blocks_avx512 that use a lot of temp registers. I think it >>> makes sense to keep those as 'function-header declarations'. >> >> I agree

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 19:30:23 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 19:41:25 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 25 commits: >> >> - Vladimir's review comments >> - Merge remote-tracking branch 'origin/master' int

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Vladimir Ivanov
On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17]

2022-11-15 Thread Vladimir Ivanov
On Tue, 15 Nov 2022 19:43:11 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Vladimir Ivanov
On Tue, 15 Nov 2022 19:38:26 GMT, Volodymyr Paprotski wrote: >> Ah, got it. Worth elaborating that in the comments. Otherwise, they confuse >> rather than help: >> >> // void processBlocks(byte[] input, int len, int[5] a, int[5] r) >> const Register input= rdi; //input+offset >> c

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Vladimir Ivanov
On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384: >> >>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const >>> Register a0, const Register a1, const Register a2, bool only128) >>> 383: { >>> 384: const

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v19]

2022-11-15 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v18]

2022-11-15 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:43:16 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:16:19 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:45:54 GMT, Vladimir Ivanov wrote: >> library_call.cpp takes care of that, it passes the address of 0'th element >> to the stub. > > Ah, got it. Worth elaborating that in the comments. Otherwise, they confuse > rather than help: > > // void processBlocks(byte[] input, i

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384: >> >>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const >>> Register a0, const Register a1, const Register a2, bool only128) >>> 383: { >>> 384: const

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v17]

2022-11-15 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:06:40 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v3]

2022-11-14 Thread Tobias Hartmann
On Mon, 24 Oct 2022 09:02:58 GMT, Tobias Hartmann wrote: >> Volodymyr Paprotski has refreshed the contents of this pull request, and >> previous commits have been removed. The incremental views will show >> differences compared to the previous content of the PR. The pull request >> contains on

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Vladimir Ivanov
On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Vladimir Ivanov
On Tue, 15 Nov 2022 00:25:46 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 387: >> >>> 385: const Register t2 = r14; >>> 386: >>> 387: __ movq(a0, Address(limbs, 0)); >> >> I don't understand how it works. `limbs` comes directly from `c_rarg2` a

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Sandhya Viswanathan
On Tue, 15 Nov 2022 00:10:35 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Vladimir Ivanov
On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Volodymyr Paprotski
On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-14 Thread Vladimir Ivanov
On Mon, 14 Nov 2022 17:48:25 GMT, Volodymyr Paprotski wrote: >> Yeah, just got to about the same conclusion by looking at the preprocessor >> `-E` output.. its declared in the header, but not defined in the 'cpp' >> file.. One would think that that's a compile error, but its been more then a >

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v15]

2022-11-14 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 17:56:55 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-14 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 20:46:57 GMT, Volodymyr Paprotski wrote: >> It's not specific to `andq`: there's a huge `#ifdef` block around the >> definitions in `assembler_x86.hpp` (lines 12201 - 13773; and there's even a >> nested `#ifdef _LP64` (lines 13515-13585)!) , but declarations aren't >> guard

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-11 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 20:34:34 GMT, Vladimir Ivanov wrote: >> I am mystified at how it actually gets removed from the `assembler_x86.o` >> object on 32-bit.. The only reliable/portable way _would_ be with `#ifdef` >> but its not there.. so.. code-generation? `sed`-like preprocessing? Can one >>

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-11 Thread Vladimir Ivanov
On Fri, 11 Nov 2022 20:08:27 GMT, Volodymyr Paprotski wrote: >> Right, `addq` instructions are x64-specific. I was confused because >> `assembler_x86.hpp` doesn't declare them as such which is a bug. > > I am mystified at how it actually gets removed from the `assembler_x86.o` > object on 32-bi

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-11 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 19:56:40 GMT, Vladimir Ivanov wrote: >> I believe its needed. >> >> TLDR.. Couple of check ins ago, I broke the 32-bit build, and that was the >> 'easy' fix.. > > Right, `addq` instructions are x64-specific. I was confused because > `assembler_x86.hpp` doesn't declare them

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-11 Thread Vladimir Ivanov
On Fri, 11 Nov 2022 18:08:50 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.hpp line 733: >> >>> 731: void andptr(Register src1, Register src2) { LP64_ONLY(andq(src1, >>> src2)) NOT_LP64(andl(src1, src2)) ; } >>> 732: >>> 733: #ifdef _LP64 >> >> Why is it x64-spec

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v15]

2022-11-11 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v13]

2022-11-11 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 01:25:07 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> jcheck > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 252: > >> 250: private

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-11 Thread Volodymyr Paprotski
On Fri, 11 Nov 2022 01:26:40 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> live review with Sandhya > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 733: > >> 731: void andptr(Re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v13]

2022-11-10 Thread Vladimir Ivanov
On Thu, 10 Nov 2022 22:59:52 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11]

2022-11-10 Thread Vladimir Ivanov
On Thu, 10 Nov 2022 22:41:31 GMT, Volodymyr Paprotski wrote: >> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 249: >> >>> 247: @ForceInline >>> 248: @IntrinsicCandidate >>> 249: private void processMultipleBlocks(byte[] input, int offset, int >>> length, lon

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-10 Thread Vladimir Ivanov
On Fri, 11 Nov 2022 01:14:05 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-10 Thread Sandhya Viswanathan
On Fri, 11 Nov 2022 01:14:05 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-10 Thread Sandhya Viswanathan
On Fri, 11 Nov 2022 01:14:05 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v14]

2022-11-10 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v13]

2022-11-10 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11]

2022-11-10 Thread Volodymyr Paprotski
On Thu, 10 Nov 2022 22:03:24 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix windows and 32b linux builds > > src/hotspot/share/opto/library_call.cpp line 6981: > >> 6979: >>

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v12]

2022-11-10 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11]

2022-11-10 Thread Sandhya Viswanathan
On Thu, 10 Nov 2022 01:22:04 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11]

2022-11-09 Thread Volodymyr Paprotski
On Thu, 10 Nov 2022 01:22:04 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v11]

2022-11-09 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9]

2022-11-09 Thread Volodymyr Paprotski
On Tue, 8 Nov 2022 23:59:42 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix 32-bit build > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: > >> 173: >

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9]

2022-11-09 Thread Volodymyr Paprotski
On Wed, 9 Nov 2022 15:55:53 GMT, Jatin Bhateja wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix 32-bit build > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1181: > >> 1179: #ifdef _LP64 >> 1180: if (s

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6]

2022-11-09 Thread Volodymyr Paprotski
On Wed, 9 Nov 2022 02:19:29 GMT, Volodymyr Paprotski wrote: >>> Did not split it up into individual constants. The main 'problem' is that >>> Address and ExternalAddress are not compatible. >> >> There's a reason for that and it's because RIP-relative addressing doesn't >> always work, so add

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v10]

2022-11-09 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9]

2022-11-09 Thread Volodymyr Paprotski
On Wed, 9 Nov 2022 00:23:21 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix 32-bit build > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 970: > >> 968: >> 969: void addmq(int

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9]

2022-11-09 Thread Volodymyr Paprotski
On Wed, 9 Nov 2022 00:10:48 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix 32-bit build > > src/hotspot/share/opto/library_call.cpp line 7014: > >> 7012: const TypeKlassPtr* rkla

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9]

2022-11-09 Thread Jatin Bhateja
On Tue, 8 Nov 2022 23:21:58 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz t

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9]

2022-11-08 Thread Volodymyr Paprotski
On Wed, 9 Nov 2022 00:10:48 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> fix 32-bit build > > src/hotspot/share/opto/library_call.cpp line 7014: > >> 7012: const TypeKlassPtr* rkla

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6]

2022-11-08 Thread Volodymyr Paprotski
On Wed, 9 Nov 2022 00:38:45 GMT, Vladimir Ivanov wrote: >> @iwanowww moved to StubGenerator as suggested.. moving functions to the >> stubGenerator_x86_64.hpp header doesn't seem 'clean' but I think that's the >> pattern. >> >> The constant pool.. stared at it for a while and ended up keeping

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6]

2022-11-08 Thread Vladimir Ivanov
On Tue, 8 Nov 2022 22:01:19 GMT, Volodymyr Paprotski wrote: > Did not split it up into individual constants. The main 'problem' is that > Address and ExternalAddress are not compatible. There's a reason for that and it's because RIP-relative addressing doesn't always work, so additional regis

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9]

2022-11-08 Thread Vladimir Ivanov
On Tue, 8 Nov 2022 23:21:58 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz t

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v9]

2022-11-08 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7]

2022-11-08 Thread Volodymyr Paprotski
On Fri, 4 Nov 2022 17:25:16 GMT, Volodymyr Paprotski wrote: >> src/hotspot/share/opto/library_call.cpp line 7036: >> >>> 7034: assert(r_start, "r array is NULL"); >>> 7035: >>> 7036: Node* call = make_runtime_call(RC_LEAF, >> >> Can we safely change this to `RC_LEAF | RC_NO_FP`? For the C

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v8]

2022-11-08 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6]

2022-11-08 Thread Volodymyr Paprotski
On Tue, 1 Nov 2022 23:49:17 GMT, Vladimir Ivanov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2002: >> >>> 2000: } >>> 2001: >>> 2002: address StubGenerator::generate_poly1305_masksCP() { >> >> I suggest to turn it into a C++ literal constant and move the declaration >> next to

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6]

2022-11-08 Thread Volodymyr Paprotski
On Tue, 1 Nov 2022 23:21:57 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> invalidkeyexception and some review comments > > src/hotspot/share/runtime/globals.hpp line 241: > >> 239:

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-11-04 Thread Sandhya Viswanathan
On Fri, 4 Nov 2022 20:59:10 GMT, Volodymyr Paprotski wrote: >> src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 175: >> >>> 173: // Choice of 1024 is arbitrary, need enough data blocks to >>> amortize conversion overhead >>> 174: // and not affect p

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-11-04 Thread Volodymyr Paprotski
On Tue, 25 Oct 2022 00:31:07 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> extra whitespace character > > src/java.base/share/classes/com/sun/crypto/provider/Poly1305.java line 17

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7]

2022-11-04 Thread Volodymyr Paprotski
On Fri, 4 Nov 2022 16:28:51 GMT, Jamil Nimeh wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - address Jamil's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7]

2022-11-04 Thread Anthony Scarpino
On Fri, 4 Nov 2022 03:20:11 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz t

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7]

2022-11-04 Thread Jamil Nimeh
On Fri, 4 Nov 2022 03:20:11 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz t

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7]

2022-11-04 Thread Volodymyr Paprotski
On Tue, 18 Oct 2022 22:51:51 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - address

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-11-04 Thread Volodymyr Paprotski
On Fri, 28 Oct 2022 20:58:33 GMT, Volodymyr Paprotski wrote: >> No, going the WhiteBox route was not something I was thinking of. I sought >> feedback from a couple hotspot-knowledgable people about the use of WhiteBox >> APIs and both felt that it was not the right way to go. One said that

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-11-04 Thread Volodymyr Paprotski
On Wed, 2 Nov 2022 03:16:57 GMT, Jatin Bhateja wrote: >>> And just looking now on uops.info, they seem to have identical timings? >> >> Actual instruction being used (aligned vs unaligned versions) doesn't matter >> much here, because it's a dynamic property of the address being accessed: >> m

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7]

2022-11-04 Thread Jamil Nimeh
On Fri, 4 Nov 2022 03:20:11 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz t

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7]

2022-11-03 Thread Volodymyr Paprotski
On Fri, 4 Nov 2022 03:20:11 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz t

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-11-03 Thread Volodymyr Paprotski
On Fri, 28 Oct 2022 21:55:59 GMT, Jamil Nimeh wrote: >> I flipped-flopped on this.. I already had the code for the exception.. and >> already described the potential fix. So rather then remove the code, pushed >> the described fix. Its always easier to remove the extra field I added. Let >> me

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v7]

2022-11-03 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-11-01 Thread Jatin Bhateja
On Tue, 1 Nov 2022 23:04:45 GMT, Vladimir Ivanov wrote: >> Hmm.. interesting. Is this for loading? `evmovdquq` vs `evmovdqaq`? I was >> actually looking at using evmovdqaq but there is no encoding for it yet (And >> just looking now on uops.info, they seem to have identical timings? perhaps >>

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6]

2022-11-01 Thread vpaprotsk
On Tue, 1 Nov 2022 23:49:17 GMT, Vladimir Ivanov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2002: >> >>> 2000: } >>> 2001: >>> 2002: address StubGenerator::generate_poly1305_masksCP() { >> >> I suggest to turn it into a C++ literal constant and move the declaration >> next to

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v6]

2022-11-01 Thread Vladimir Ivanov
On Tue, 1 Nov 2022 23:17:46 GMT, Vladimir Ivanov wrote: >> vpaprotsk has updated the pull request incrementally with one additional >> commit since the last revision: >> >> invalidkeyexception and some review comments > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 2002: > >> 2000: } >

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v5]

2022-11-01 Thread Vladimir Ivanov
On Fri, 28 Oct 2022 20:19:35 GMT, vpaprotsk wrote: > And just looking now on uops.info, they seem to have identical timings? Actual instruction being used (aligned vs unaligned versions) doesn't matter much here, because it's a dynamic property of the address being accessed: misaligned accesse

  1   2   >