Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:41:32 GMT, Volodymyr Paprotski wrote: >> Yes, please. And for the upper half of register file, just code it as a loop >> over register range: >> >> for (int rxmm_num = 16; rxmm_num < 30; rxmm_num++) { >> XMMRegister rxmm = as_XMMRegister(rxmm_num); >> __ vpxorq(rxmm,

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:16:14 GMT, Volodymyr Paprotski wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 756: >> >>> 754: >>> 755: // Store R^8-R for later use >>> 756: __ evmovdquq(Address(rsp, 64*0), B0, Assembler::AVX_512bit); >> >> Could these vector spills be eliminated?

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:39:00 GMT, Vladimir Ivanov wrote: >> ah.. I remember thinking about doing that.. `vzeroall` isnt encoded yet and >> I figured since I already have to do the xmm16-29, might as well do them >> all.. should I add that instruction too? > > Yes, please. And for the upper half

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Vladimir Ivanov
On Wed, 16 Nov 2022 23:14:45 GMT, Volodymyr Paprotski wrote: >> Or simply switch to `vzeroall` for `xmm0` - `xmm15`. > > ah.. I remember thinking about doing that.. `vzeroall` isnt encoded yet and I > figured since I already have to do the xmm16-29, might as well do them all.. > should I add th

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:08:16 GMT, Vladimir Ivanov wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 917: >> >>> 915: // Cleanup >>> 916: __ vpxorq(xmm0, xmm0, xmm0, Assembler::AVX_512bit); >>> 917: __ vpxorq(xmm1, xmm1, xmm1, Assembler::AVX_512bit); >> >> You could use T0,

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
On Wed, 16 Nov 2022 23:12:28 GMT, Vladimir Ivanov wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> redo register alloc with explicit func params > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 756:

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Vladimir Ivanov
On Wed, 16 Nov 2022 20:52:14 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Vladimir Ivanov
On Wed, 16 Nov 2022 22:47:37 GMT, Sandhya Viswanathan wrote: >> Volodymyr Paprotski has updated the pull request incrementally with one >> additional commit since the last revision: >> >> redo register alloc with explicit func params > > src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Sandhya Viswanathan
On Wed, 16 Nov 2022 20:52:14 GMT, Volodymyr Paprotski wrote: >> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 >> message blocks at a time. For more details, left a lot of comments in >> `macroAssembler_x86_poly.cpp`. >> >> - Added new KAT test for Poly1305 and a fuzz

Re: RFR: 8288047: Accelerate Poly1305 on x86_64 using AVX512 instructions [v20]

2022-11-16 Thread Volodymyr Paprotski
> Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 > message blocks at a time. For more details, left a lot of comments in > `macroAssembler_x86_poly.cpp`. > > - Added new KAT test for Poly1305 and a fuzz test to compare intrinsic and > java. > - Would like to add an `I