> I think it also improves branch target prediction - if you have a tight > loop of a few opcodes the predictor can guess where you're headed (since > there is a separate lookup key for each opcode), whereas with the > original code, there's a single key which cannot be used to predict the > branch target.
At least usually. I caught at least one version of gcc to CSE the jump instruction into a common location in one of my interpreters :-( But not all do it at least, and I hope gcc gets fixed. It's still faster for other reasons usually. -Andi