Re: Python and the need for speed

Erik Tue, 18 Apr 2017 17:11:51 -0700

On 19/04/17 00:33, bartc wrote:

So that's 'label-pointers' which I assume must correspond to computed
goto.

Yes - just different terminology. Being able to take the address of alabel and "goto address" rather than "goto label".

(I don't know why they should be faster than a switch; they just
are.)

In C, the code generated for a switch() can do almost anything. It mightcache the expression in a temporary and generate a linear "if .. else if.. else if ... else if" sequence (which is probably quite common for asparsely populated set of values, after an initial range check is done)or it might generate a jump table (similar to the computed gotos) (whichis probably quite common for a densely populated set of values, after aninitial range check is done and an adjustment to make the indexzero-based). It could also generate code that is some sort of hybrid ofthe two (for example, a switch with several densely-populated areas inan otherwise sparse set of values might do linear range-checks to findthe right jump-table to use for each dense area).

What the computed goto stuff does is effectively reduce all of that to asingle jump table with a known set of indices - there are 256 opcodesstarting from 0 and each has an entry in an array of code pointers. Tojump to the handler for an opcode, just 'goto handlers[op]'. No rangechecking, no index adjustment, no linear test sequences. This is whatmakes the dispatch fast.

With the sort of lower level programs I write (in another dynamic
language not Python), such an assembly layer improved performance 2-3
times over using 100% HLL compiled using C and gcc-O3.

Did you give the C compiler enough hints though? Much like the abovecomputed-goto stuff (which is less of a hint and more of an instructionthat the compiler should be using jump tables and nothing else for thatbit of code) there are lots of other ways of spelling things differentlyin C that can give better performance than what you might get by default(and I'm not even talking about compiler-specific #pragmas or whatever).

Also, remember that -O3 might (and by that I mean probably will! ;))make your code larger. If you have some specific core areas of yourinterpreter that are now large enough to cause instruction cache missesthen a smaller -O2 (or even -Os) compiled version might perform betteron your hardware.


E.
--
https://mail.python.org/mailman/listinfo/python-list

Re: Python and the need for speed

Reply via email to