Alexandre Vassalotti <alexan...@peadrop.com> added the comment: Paolo wrote: > So, can you try dropping the switch altogether, using always computed > goto and seeing how does the resulting code get compiled?
Removing the switch won't be possible unless we change the semantic EXTENDED_ARG. In addition, I doubt the improvement, if any, would worth the increased complexity. > To be absolutely clear: x86_64 has more registers, so the rest of the > interpreter is faster than x86, but dispatch still takes the same > absolute time, which is 70% on x86_64, but only 50% on x86 (those are > realistic figures); I don't understand what you mean by "absolute time" here. Do you actually mean the time spent interpreting bytecodes compared to the time spent in the other parts of Python? If so, your figures are wrong for CPython on x86-64. It is about 50% just like on x86 (when running pybench). With the patch, this drops to 35% on x86-64 and to 45% on x86. > In my toy interpreter, computing last_i for each dispatch doesn't give > any big slowdown, but storing it in f->last_i gives a ~20% slowdown. I patched ceval.c to minimize f->last_i manipulations in the dispatch code. On x86, I got an extra 9% speed up on pybench. However, the patch is a bit clumsy and a few unit tests are failing. I will see if I can improve it and open a new issue if worthwhile. _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4753> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com