On Mon, 21 Aug 2023 at 18:18, Richard Henderson
<richard.hender...@linaro.org> wrote:
>
> Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
> carry-less multiply under emulation.
>
> Changes for v3:
>   * Update target/i386 ops_sse.h.
>   * Apply r-b.
>
> Changes for v2:
>   * Only accelerate clmul_64; keep generic helpers for other sizes.
>   * Drop most of the Int128 interfaces, except for clmul_64.
>   * Use the same acceleration format as aes-round.h.
>
>
> r~
>
>
> [1] https://patchew.org/QEMU/20230601123332.3297404-1-a...@kernel.org/
>
>
> Richard Henderson (19):
>   crypto: Add generic 8-bit carry-less multiply routines
>   target/arm: Use clmul_8* routines
>   target/s390x: Use clmul_8* routines
>   target/ppc: Use clmul_8* routines
>   crypto: Add generic 16-bit carry-less multiply routines
>   target/arm: Use clmul_16* routines
>   target/s390x: Use clmul_16* routines
>   target/ppc: Use clmul_16* routines
>   crypto: Add generic 32-bit carry-less multiply routines
>   target/arm: Use clmul_32* routines
>   target/s390x: Use clmul_32* routines
>   target/ppc: Use clmul_32* routines
>   crypto: Add generic 64-bit carry-less multiply routine
>   target/arm: Use clmul_64
>   target/i386: Use clmul_64
>   target/s390x: Use clmul_64
>   target/ppc: Use clmul_64
>   host/include/i386: Implement clmul.h
>   host/include/aarch64: Implement clmul.h
>

OK, I did the OpenSSL benchmark this time, using a x86_64 cross build
on arm64/ThunderX2, and the speedup is 7x (\o/)

Tested-by: Ard Biesheuvel <a...@kernel.org>
Acked-by: Ard Biesheuvel <a...@kernel.org>



Distro qemu (no acceleration):

$ qemu-x86_64 --version
qemu-x86_64 version 7.2.4 (Debian 1:7.2+dfsg-7+deb12u1)

$ apps/openssl speed -evp aes-128-gcm
version: 3.2.0-dev
built on: Mon Aug 21 17:57:37 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfed8320b0fcbfffd:0x8001020c01d843a9
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-GCM       8856.13k    13820.95k    17375.49k    16826.37k
16870.06k    17208.66k


QEMU built with this series applied onto latest master:

$ ~/build/qemu/build/qemu-x86_64 apps/openssl speed -evp aes-128-gcm
version: 3.2.0-dev
built on: Mon Aug 21 17:57:37 2023 UTC
options: bn(64,64)
compiler: x86_64-linux-gnu-gcc -pthread -m64 -Wa,--noexecstack -Wall
-O3 -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_BUILDING_OPENSSL
-DNDEBUG
CPUINFO: OPENSSL_ia32cap=0xfffa320b0fcbfffd:0x8041020c01dc47a9
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes
8192 bytes  16384 bytes
AES-128-GCM      14237.01k    34176.34k    70633.13k    97372.84k
119668.74k   122049.88k

Reply via email to