John-Mark Gurney:
> So, as I was working on FreeBSD's implementation of gmac.c, I noticed
> that I was able to get a significant speed up by using a mask instead
> of an if branch in ghash_gfmul in gmac.c from OpenBSD...
>
> Add a mask var and replace the code between the comments
> "update Z" and "update V" w/:
> mask = !!(x[i >> 3] & (1 << (~i & 7)));
> mask = ~(mask - 1);
>
> z[0] ^= v[0] & mask;
> z[1] ^= v[1] & mask;
> z[2] ^= v[2] & mask;
> z[3] ^= v[3] & mask;
>
> And you should see a nice performance increase...
I tried this on a Soekris net6501-50 and the performance increase
was around 1.3%. (I set up an ESP transport association with
AES-128-GMAC and pushed UDP traffic with tcpbench over it.)
A look at the generated amd64 assembly code shows that the change
indeed removes a branch. What's pretty shocking is that this code
mul = v[3] & 1;
...
v[0] = (v[0] >> 1) ^ (0xe1000000 * mul);
is turned into an actual imul instruction by GCC. I used the same
masking approach to get rid of the multiplication, but the improvement
was minuscule (<1%).
> I also have an implementation of ghash that does a 4 bit lookup table
> version with the table split between cache lines in p4 at:
> https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.c&REV=4
I'll have to look at this, but haven't there been increasing
misgivings about table implementations for GHASH because of timing
attacks?
--
Christian "naddy" Weisgerber [email protected]