Mark H. Wood wrote:
Notice a few things:
o The OP asked about reducing CPU load, but the answers all talk
about making encryption faster. These are not the same thing.
Offloading encryption might *reduce* throughput of the encrypted
streams, and yet free up CPU time to do other things. Encrypted
communication might not be the highest priority task in the
system, and there might not be much of it to do per unit time.
well, the OP's indicated they didn't want to use an embedded processor
in their design, just hard wired logic. this means the device won't
have much in the way of 'smarts', which pretty much means the CPU will
have to spoon feed it, unless it uses the buffer design I previously
suggested (but I'm hard pressed to see how to implement that without
some sort of sequencer in the hardware). If the CPU is going to have
to spoon feed the data (by this, I mean, read and write every word to
this hardware), then the simple act of writnig and reading the data to
the hardware will consume CPU time, and if the device can't process the
encryption faster than the CPU could on its own, its going to end up
taking MORE cpu time.
note, I have something of a background in designing embedded IO hardware
and programming low level device drivers back in the 80s/90s.
If I was doing this, I think I'd want just enough of a microcoded
sequencer in the FPGA to be able to run out of a buffer ram chip thats
'dual ported' to the host.... (that, or use a bus mastering DMA engine
and locate these buffers in the ARM's own RAM, but thats pretty complex
too). This buffer memory could be split into 4 or 8 fixed sized
buffers on power-of-two boundries... 2 for writing data to be encrypted,
and 2 for reading back the encrypted data. Perhaps 2 more for writing
data to be decrypted, and 2 for reading back the decrypted data, if this
thing is to operate in a full duplex manner and using an asymmetrical
cypher.
its possible you'd not need separate output buffers and could just write
the output over the input... then you could reduce this to just a pair
of buffers.
Each buffer could have a few bytes at the beginning or end that contain
things like the cypher keys, and data length and status/command (or this
command/status/key stuff could be in a seperate address space stored in
on-chip static registers...). the bulk of the actual
encryption/decryption could be a hard wired pipeline, the sequencer just
manages the data flow.
by building the engine this way, the driver software in the ARM host
gets an interrupt that a work unit is done, and simply has to block-move
the last message out of the buffer, and the next message into the buffer
and signal to the chip that its OK to proceed when its finished with the
other buffer.
o This is a student project. The objective is to learn something
specific about the design of digital systems, not (necessarily) to
maximize throughput. The requirements don't have to make practical
sense, so long as they make educational sense.
yup, you often learn more by failing than you do by succeeding.
______________________________________________________________________
OpenSSL Project http://www.openssl.org
User Support Mailing List openssl-users@openssl.org
Automated List Manager majord...@openssl.org