Re: [zfs-discuss] Encryption accelerator card recommendations.

Andrew Gabriel Tue, 28 Jun 2011 12:29:06 -0700

 On 06/27/11 11:32 PM, Bill Sommerfeld wrote:

On 06/27/11 15:24, David Magda wrote:

Given the amount of transistors that are available nowadays I think
it'd be simpler to just create a series of SIMD instructions right
in/on general CPUs, and skip the whole co-processor angle.

see: http://en.wikipedia.org/wiki/AES_instruction_set


Present in many current Intel CPUs; also expected to be present in AMD's
"Bulldozer" based CPUs.

I recall seeing a blog comparing the existing Solaris hand-tuned AESassembler performance with the (then) new AES instruction version, wherethe Intel AES instructions only got you about a 30% performanceincrease. I've seen reports of better performance improvements, butusually by comparing with the performance on older processors which aregoing to be slower for additional reasons then just missing the AESinstructions. Also, you could claim better performance improvement ifyou compared against a less efficient original implementation of AES.What this means is that a faster CPU may buy you more crypto performancethan the AES instructions alone will do.

My understanding from reading the Intel AES instruction set (which Iwarn might not be completely correct) is that the AESencryption/decryption instruction is executed between 10 and 14 times(depending on key length) for each 128 bits (16 bytes) of data beingencrypted/decrypted, so it's very much part of the regular instructionpipeline. The code will have to loop though this process multiple timesto process a data block bigger than 16 bytes, i.e. a double nested loop,although I expect it's normally loop-unrolled a fair degree foroptimisation purposes.

Conversely, the crypto units in the T-series processors are separatefrom the CPU, and do the encryption/decryption whilst the CPU is gettingon with something else, and they do it much faster than it could be doneon the CPU. Small blocks are normally a problem for crypto offloadengines because the overhead of farming off the work to the engine andgetting the result back often means that you can do the crypto on theCPU faster than the time it takes to get the crypto engine started andstopped. However, T-series crypto is particularly good at handling smallblocks efficiently, such as around 1kbyte which you are likely to findin a network packet, as it is much closer coupled to the CPU than a PCIcrypto card can be, and performance with small packets was key for thecrypto networking support T-series was designed for. Of course, ithandles crypto of large blocks just fine too.


--
Andrew
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Encryption accelerator card recommendations.

Reply via email to