After downloading a dozen PDF files I've given up. All I need is the approximate cycle counts for instructions and address modes.
The particular problem I've got now is deciding which of these three is the fastest: movl (%edi,%eax,4),%eax movl (%edi,%eax),%eax movl (%edi),%eax Same with: movl $1,(%eax,%edx,4) movl $1,(%eax,%edi) According to an old 486 book I have, it claims that complex addressing modes don't have cycle penalties for leaving out the scale or the offset. That seems hard to believe for the RISC-like P3s and Athalons. What about other processors? Is it common to have address modes like: base+offset*scale Most RISC instruction sets only provide base + constant offset don't they? Yeah, yeah, I know. Premature optimization is the root of all evil. Except this isn't premature. Getting to 50 mops was pretty easy. Getting to 100 mops is a *lot* harder! - Ken