Paolo 'Blaisorblade' Giarrusso <p.giarru...@gmail.com> added the comment:
> You may want to check out issue1408710 in which a similar patch was > provided, but failed to deliver the desired results. It's not really similar, because you don't duplicate the dispatch code. It took me some time to understand why you didn't change the "goto fast_next_opcode", but that's where you miss the speedup. The only difference with your change is that you save the range check for the switch, so the slowdown probably comes from some minor output change from GCC I guess. Anyway, this suggests that the speedup really comes from better branch prediction and not from saving the range check. The 1st paper I mentioned simply states that saving the range check might make a small differences. The point is that sometimes, when you are going to flush the pipeline, it's like adding a few instructions, even conditional jumps, does not make a difference. I've observed this behaviour quite a few times while building from scratch a small Python interpreter. I guess (but this might be wrong) that's because the execution units were not used at their fullest, and adding conditional jumps doesn't make a differeence because flushing a pipeline once or twice is almost the same (the second flush removes just few instructions). Or something like that, I'm not expert enough of CPU architecture to be sure of such guesses. _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4753> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com