Nick Glencross <[EMAIL PROTECTED]> wrote:
Still some way off the OS md5sum, which is typically 0.15 seconds, about 12x quicker. That may sound quite a bit, but much of it can probably be accounted for by inefficiencies in my conversion to parrot code (a slightly awkward rol, and perhaps the manipulation of the arrays).
A new opcode C<rol> would certainly help, yes. It would replace 4 instructions (each roughly executed once per char) with one instruction.
A JITted C<rol> opcode should give a speed up of one forth - which is a lot.
I was a bit too optimistic with my assumptions here. I have now implemented a C<rot> opcode and the one used signature for MD5 as a JIT opcode for x86. But the speedup is much smaller: around 5%.
md5sum of perl-5.8.0.tar.gz size=11023084
md5sum 0.11 user, 0.20 real parrot -j 2.63 user 2.68 real parrot -j rot 2.57 user 2.75 real
The problem with md5 code and Parrot JIT seems to be related to the register allocator. md5 code is one big basic block of integer code. As we don't do any register renaming, the CPU-register usage especially on x86 is suboptimal.
leo