[issue4753] Faster opcode dispatch on gcc

Paolo 'Blaisorblade' Giarrusso Fri, 02 Jan 2009 18:21:04 -0800

Paolo 'Blaisorblade' Giarrusso <p.giarru...@gmail.com> added the comment:


> I'm not an expert in this kind of optimizations. Could we gain more
speed by making the dispatcher table more dense? Python has less than
128 opcodes (len(opcode.opmap) == 113) so they can be squeezed in a
smaller table. I naively assume a smaller table increases the amount of
cache hits.

Well, you have no binary compatibility constraint with a new release, so
it can be tried and benchmarked, or it can be done anyway!
On x86_64 the impact of the jump table is 8 bytes per pointer * 256
pointers = 2KiB, and the L1 data cache of Pentium4 can be 8KiB or 16KiB
wide.
But I don't expect this to be noticeable in most synthetic
microbenchmarks. Matrix multiplication would be the perfect one I guess;
the repeated column access would kill the L1 data cache, if the whole
matrixes don't fit.

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4753>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4753] Faster opcode dispatch on gcc

Reply via email to