On Mon, 21 Aug 2023 at 17:15, Richard Henderson
<richard.hender...@linaro.org> wrote:
>
> On 8/21/23 07:57, Ard Biesheuvel wrote:
> >> Richard Henderson (18):
> >>    crypto: Add generic 8-bit carry-less multiply routines
> >>    target/arm: Use clmul_8* routines
> >>    target/s390x: Use clmul_8* routines
> >>    target/ppc: Use clmul_8* routines
> >>    crypto: Add generic 16-bit carry-less multiply routines
> >>    target/arm: Use clmul_16* routines
> >>    target/s390x: Use clmul_16* routines
> >>    target/ppc: Use clmul_16* routines
> >>    crypto: Add generic 32-bit carry-less multiply routines
> >>    target/arm: Use clmul_32* routines
> >>    target/s390x: Use clmul_32* routines
> >>    target/ppc: Use clmul_32* routines
> >>    crypto: Add generic 64-bit carry-less multiply routine
> >>    target/arm: Use clmul_64
> >>    target/s390x: Use clmul_64
> >>    target/ppc: Use clmul_64
> >>    host/include/i386: Implement clmul.h
> >>    host/include/aarch64: Implement clmul.h
> >>
> >
> > I didn't re-run the OpenSSL benchmark, but the x86 Linux kernel still
> > passes all its crypto selftests when running under TCG emulation on a
> > TX2 arm64 host, so
> >
> > Tested-by: Ard Biesheuvel <a...@kernel.org>
>
> Oh, whoops.  What's missing here?  Any target/i386 changes.
>

Ah yes - I hadn't spotted that. The below seems to do the trick.

--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -2156,7 +2156,10 @@ void glue(helper_pclmulqdq, SUFFIX)(CPUX86State
*env, Reg *d, Reg *v, Reg *s,
     for (i = 0; i < 1 << SHIFT; i += 2) {
         a = v->Q(((ctrl & 1) != 0) + i);
         b = s->Q(((ctrl & 16) != 0) + i);
-        clmulq(&d->Q(i), &d->Q(i + 1), a, b);
+
+        Int128 r = clmul_64(a, b);
+        d->Q(i) = int128_getlo(r);
+        d->Q(i + 1) = int128_gethi(r);
     }
 }

[and the #include added and clmulq() dropped]

I did a quick RFC4106 benchmark with tcrypt (which doesn't speed up as
much as OpenSSL but it is a bit of a hassle cross-rebuilding that)

no acceleration:

tcrypt: test 7 (160 bit key, 8192 byte blocks): 1547 operations in 1
seconds (12673024 bytes)

AES only:

tcrypt: test 7 (160 bit key, 8192 byte blocks): 1679 operations in 1
seconds (13754368 bytes)

AES and PMULL

tcrypt: test 7 (160 bit key, 8192 byte blocks): 3298 operations in 1
seconds (27017216 bytes)

Reply via email to