On Sun, Dec 10, 2000 at 12:24:59PM -0800, Alfred Perlstein wrote:
> I would try unrolling the loop some (if possible) and retesting.

The inner loop was already unrolled, but was only processing single
bytes at a time.  By loading in 32-bit words at once, it reduced the
cost to only 7 cycles per byte (from 13).
-- 
Bruce Guenter <[EMAIL PROTECTED]>                       http://em.ca/~bruceg/

PGP signature

Reply via email to