On 19/04/17 00:33, bartc wrote:
So that's 'label-pointers' which I assume must correspond to computed goto.
Yes - just different terminology. Being able to take the address of a label and "goto address" rather than "goto label".
(I don't know why they should be faster than a switch; they just are.)
In C, the code generated for a switch() can do almost anything. It might cache the expression in a temporary and generate a linear "if .. else if .. else if ... else if" sequence (which is probably quite common for a sparsely populated set of values, after an initial range check is done) or it might generate a jump table (similar to the computed gotos) (which is probably quite common for a densely populated set of values, after an initial range check is done and an adjustment to make the index zero-based). It could also generate code that is some sort of hybrid of the two (for example, a switch with several densely-populated areas in an otherwise sparse set of values might do linear range-checks to find the right jump-table to use for each dense area).
What the computed goto stuff does is effectively reduce all of that to a single jump table with a known set of indices - there are 256 opcodes starting from 0 and each has an entry in an array of code pointers. To jump to the handler for an opcode, just 'goto handlers[op]'. No range checking, no index adjustment, no linear test sequences. This is what makes the dispatch fast.
With the sort of lower level programs I write (in another dynamic language not Python), such an assembly layer improved performance 2-3 times over using 100% HLL compiled using C and gcc-O3.
Did you give the C compiler enough hints though? Much like the above computed-goto stuff (which is less of a hint and more of an instruction that the compiler should be using jump tables and nothing else for that bit of code) there are lots of other ways of spelling things differently in C that can give better performance than what you might get by default (and I'm not even talking about compiler-specific #pragmas or whatever).
Also, remember that -O3 might (and by that I mean probably will! ;)) make your code larger. If you have some specific core areas of your interpreter that are now large enough to cause instruction cache misses then a smaller -O2 (or even -Os) compiled version might perform better on your hardware.
E. -- https://mail.python.org/mailman/listinfo/python-list