Paolo 'Blaisorblade' Giarrusso <p.giarru...@gmail.com> added the comment:
> I'm not an expert in this kind of optimizations. Could we gain more speed by making the dispatcher table more dense? Python has less than 128 opcodes (len(opcode.opmap) == 113) so they can be squeezed in a smaller table. I naively assume a smaller table increases the amount of cache hits. Well, you have no binary compatibility constraint with a new release, so it can be tried and benchmarked, or it can be done anyway! On x86_64 the impact of the jump table is 8 bytes per pointer * 256 pointers = 2KiB, and the L1 data cache of Pentium4 can be 8KiB or 16KiB wide. But I don't expect this to be noticeable in most synthetic microbenchmarks. Matrix multiplication would be the perfect one I guess; the repeated column access would kill the L1 data cache, if the whole matrixes don't fit. _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4753> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com