Mark H. Wood wrote:
Notice a few things:

o  The OP asked about reducing CPU load, but the answers all talk
   about making encryption faster.  These are not the same thing.
   Offloading encryption might *reduce* throughput of the encrypted
   streams, and yet free up CPU time to do other things.  Encrypted
   communication might not be the highest priority task in the
   system, and there might not be much of it to do per unit time.

well, the OP's indicated they didn't want to use an embedded processor in their design, just hard wired logic. this means the device won't have much in the way of 'smarts', which pretty much means the CPU will have to spoon feed it, unless it uses the buffer design I previously suggested (but I'm hard pressed to see how to implement that without some sort of sequencer in the hardware). If the CPU is going to have to spoon feed the data (by this, I mean, read and write every word to this hardware), then the simple act of writnig and reading the data to the hardware will consume CPU time, and if the device can't process the encryption faster than the CPU could on its own, its going to end up taking MORE cpu time.

note, I have something of a background in designing embedded IO hardware and programming low level device drivers back in the 80s/90s.

If I was doing this, I think I'd want just enough of a microcoded sequencer in the FPGA to be able to run out of a buffer ram chip thats 'dual ported' to the host.... (that, or use a bus mastering DMA engine and locate these buffers in the ARM's own RAM, but thats pretty complex too). This buffer memory could be split into 4 or 8 fixed sized buffers on power-of-two boundries... 2 for writing data to be encrypted, and 2 for reading back the encrypted data. Perhaps 2 more for writing data to be decrypted, and 2 for reading back the decrypted data, if this thing is to operate in a full duplex manner and using an asymmetrical cypher. its possible you'd not need separate output buffers and could just write the output over the input... then you could reduce this to just a pair of buffers.

Each buffer could have a few bytes at the beginning or end that contain things like the cypher keys, and data length and status/command (or this command/status/key stuff could be in a seperate address space stored in on-chip static registers...). the bulk of the actual encryption/decryption could be a hard wired pipeline, the sequencer just manages the data flow.

by building the engine this way, the driver software in the ARM host gets an interrupt that a work unit is done, and simply has to block-move the last message out of the buffer, and the next message into the buffer and signal to the chip that its OK to proceed when its finished with the other buffer.

o  This is a student project.  The objective is to learn something
   specific about the design of digital systems, not (necessarily) to
   maximize throughput.  The requirements don't have to make practical
   sense, so long as they make educational sense.

yup, you often learn more by failing than you do by succeeding.


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    openssl-users@openssl.org
Automated List Manager                           majord...@openssl.org

Reply via email to