On Mon, 21 Aug 2023 at 18:18, Richard Henderson <richard.hender...@linaro.org> wrote: > > Inspired by Ard Biesheuvel's RFC patches [1] for accelerating > carry-less multiply under emulation. > > Changes for v3: > * Update target/i386 ops_sse.h. > * Apply r-b. > > Changes for v2: > * Only accelerate clmul_64; keep generic helpers for other sizes. > * Drop most of the Int128 interfaces, except for clmul_64. > * Use the same acceleration format as aes-round.h. > > > r~ > > > [1] https://patchew.org/QEMU/20230601123332.3297404-1-a...@kernel.org/ > > > Richard Henderson (19): > crypto: Add generic 8-bit carry-less multiply routines > target/arm: Use clmul_8* routines > target/s390x: Use clmul_8* routines > target/ppc: Use clmul_8* routines > crypto: Add generic 16-bit carry-less multiply routines > target/arm: Use clmul_16* routines > target/s390x: Use clmul_16* routines > target/ppc: Use clmul_16* routines > crypto: Add generic 32-bit carry-less multiply routines > target/arm: Use clmul_32* routines > target/s390x: Use clmul_32* routines > target/ppc: Use clmul_32* routines > crypto: Add generic 64-bit carry-less multiply routine > target/arm: Use clmul_64 > target/i386: Use clmul_64 > target/s390x: Use clmul_64 > target/ppc: Use clmul_64 > host/include/i386: Implement clmul.h > host/include/aarch64: Implement clmul.h >
OK, I did the OpenSSL benchmark this time, using a x86_64 cross build on arm64/ThunderX2, and the speedup is 7x (\o/) Tested-by: Ard Biesheuvel <a...@kernel.org> Acked-by: Ard Biesheuvel <a...@kernel.org> Distro qemu (no acceleration): $ qemu-x86_64 --version qemu-x86_64 version 7.2.4 (Debian 1:7.2+dfsg-7+deb12u1) $ apps/openssl speed -evp aes-128-gcm version: 3.2.0-dev built on: Mon Aug 21 17:57:37 2023 UTC options: bn(64,64) compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-128-GCM 8856.13k 13820.95k 17375.49k 16826.37k 16870.06k 17208.66k QEMU built with this series applied onto latest master: $ ~/build/qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-gcm version: 3.2.0-dev built on: Mon Aug 21 17:57:37 2023 UTC options: bn(64,64) compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL -DNDEBUG CPUINFO: OPENSSL_ia32cap=0xfffa320b0fcbfffd:0x8041020c01dc47a9 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes AES-128-GCM 14237.01k 34176.34k 70633.13k 97372.84k 119668.74k 122049.88k