Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 23:51:22 GMT, Vladimir Ivanov wrote: >> Added a comment, hopefully less confusing. > > On a second thought, passing derived pointers as arguments doesn't mix well > with safepoint awareness. > (And this stub eventually has to become safepoint aware.) > Deriving a pointer insi

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 19:38:56 GMT, Volodymyr Paprotski wrote: >>> On other hand, there are functions like poly1305_multiply8_avx512 and >>> poly1305_process_blocks_avx512 that use a lot of temp registers. I think it >>> makes sense to keep those as 'function-header declarations'. >> >> I agree

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-16 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 19:30:23 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Vladimir Ivanov
On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Vladimir Ivanov
On Tue, 15 Nov 2022 19:38:26 GMT, Volodymyr Paprotski wrote: >> Ah, got it. Worth elaborating that in the comments. Otherwise, they confuse >> rather than help: >> >> // void processBlocks(byte[] input, int len, int[5] a, int[5] r) >> const Register input= rdi; //input+offset >> c

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Vladimir Ivanov
On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384: >> >>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const >>> Register a0, const Register a1, const Register a2, bool only128) >>> 383: { >>> 384: const

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:43:16 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:16:19 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:45:54 GMT, Vladimir Ivanov wrote: >> library_call.cpp takes care of that, it passes the address of 0'th element >> to the stub. > > Ah, got it. Worth elaborating that in the comments. Otherwise, they confuse > rather than help: > > // void processBlocks(byte[] input, i

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384: >> >>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const >>> Register a0, const Register a1, const Register a2, bool only128) >>> 383: { >>> 384: const

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-15 Thread Volodymyr Paprotski
On Tue, 15 Nov 2022 00:06:40 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Vladimir Ivanov
On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Vladimir Ivanov
On Tue, 15 Nov 2022 00:25:46 GMT, Sandhya Viswanathan wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 387: >> >>> 385: const Register t2 = r14; >>> 386: >>> 387: __ movq(a0, Address(limbs, 0)); >> >> I don't understand how it works. `limbs` comes directly from `c_rarg2` a

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Sandhya Viswanathan
On Tue, 15 Nov 2022 00:10:35 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request with a new target base due >> to a merge or a rebase. The pull request now contains 23 commits: >> >> - Merge remote-tracking branch 'origin/master' into avx512-poly >> - Vladimir's re

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Vladimir Ivanov
On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Volodymyr Paprotski
On Mon, 14 Nov 2022 17:58:36 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v16]

2022-11-14 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I