Inspired by Ard Biesheuvel's RFC patches [1] for accelerating carry-less multiply under emulation.
This is less polished than the AES patch set: (1) Should I split HAVE_CLMUL_ACCEL into per-width HAVE_CLMUL{N}_ACCEL? The "_generic" and "_accel" split is different from aes-round.h because of the difference in support for different widths, and it means that each host accel has more boilerplate. (2) Should I bother trying to accelerate anything other than 64x64->128? That seems to be the one that GSM really wants anyway. I'd keep all of the sizes implemented generically, since that centralizes the 3 target implementations. (3) The use of Int128 isn't fantastic -- better would be a vector type, though that has its own special problems for ppc64le (see the endianness hoops within aes-round.h). Perhaps leave things in env memory, like I was mostly able to do with AES? (4) No guest test case(s). r~ [1] https://patchew.org/QEMU/20230601123332.3297404-1-a...@kernel.org/ Richard Henderson (18): crypto: Add generic 8-bit carry-less multiply routines target/arm: Use clmul_8* routines target/s390x: Use clmul_8* routines target/ppc: Use clmul_8* routines crypto: Add generic 16-bit carry-less multiply routines target/arm: Use clmul_16* routines target/s390x: Use clmul_16* routines target/ppc: Use clmul_16* routines crypto: Add generic 32-bit carry-less multiply routines target/arm: Use clmul_32* routines target/s390x: Use clmul_32* routines target/ppc: Use clmul_32* routines crypto: Add generic 64-bit carry-less multiply routine target/arm: Use clmul_64 target/s390x: Use clmul_64 target/ppc: Use clmul_64 host/include/i386: Implement clmul.h host/include/aarch64: Implement clmul.h host/include/aarch64/host/cpuinfo.h | 1 + host/include/aarch64/host/crypto/clmul.h | 230 +++++++++++++++++++++++ host/include/generic/host/crypto/clmul.h | 28 +++ host/include/i386/host/cpuinfo.h | 1 + host/include/i386/host/crypto/clmul.h | 187 ++++++++++++++++++ host/include/x86_64/host/crypto/clmul.h | 1 + include/crypto/clmul.h | 123 ++++++++++++ target/arm/tcg/vec_internal.h | 11 -- crypto/clmul.c | 163 ++++++++++++++++ target/arm/tcg/mve_helper.c | 16 +- target/arm/tcg/vec_helper.c | 112 ++--------- target/ppc/int_helper.c | 63 +++---- target/s390x/tcg/vec_int_helper.c | 175 +++++++---------- util/cpuinfo-aarch64.c | 4 +- util/cpuinfo-i386.c | 1 + crypto/meson.build | 9 +- 16 files changed, 865 insertions(+), 260 deletions(-) create mode 100644 host/include/aarch64/host/crypto/clmul.h create mode 100644 host/include/generic/host/crypto/clmul.h create mode 100644 host/include/i386/host/crypto/clmul.h create mode 100644 host/include/x86_64/host/crypto/clmul.h create mode 100644 include/crypto/clmul.h create mode 100644 crypto/clmul.c -- 2.34.1