So, as I was working on FreeBSD's implementation of gmac.c, I noticed
that I was able to get a significant speed up by using a mask instead
of an if branch in ghash_gfmul in gmac.c from OpenBSD...
Add a mask var and replace the code between the comments
"update Z" and "update V" w/:
mask = !!(x[i >> 3] & (1 << (~i & 7)));
mask = ~(mask - 1);
z[0] ^= v[0] & mask;
z[1] ^= v[1] & mask;
z[2] ^= v[2] & mask;
z[3] ^= v[3] & mask;
And you should see a nice performance increase...
I also have an implementation of ghash that does a 4 bit lookup table
version with the table split between cache lines in p4 at:
https://p4db.freebsd.org/fileViewer.cgi?FSPC=//depot/projects/opencrypto/sys/opencrypto/gfmult.c&REV=4
This also has a version with does 4 blocks at a time getting a
further speed up...
--
John-Mark Gurney Voice: +1 415 225 5579
"All that I will do, has been done, All that I have, has not."