Serhiy Storchaka added the comment: > Sorry, I don't understand how running 1 iteration instead of 10 makes the > benchmark less reliable. IMO the reliability is more impacted by the number > of repeatitions (-r). I changed the default from 3 to 5 repetitions, so > timeit should be *more* reliable in Python 3.7 than 3.6.
Caches. Not high-level caching that can make the measurement senseless, but low-level caching, for example memory caching, that can cause small difference (but this difference can be larger than the effect that you measure). On every repetition you first run a setup code, and then run testing code in loops. After the first loop the memory cache is filled with used data and next loops can be faster. On next repetition running a setup code can unload this data from the memory cache, and the next loop will need to load it back from slow memory. Thus on every repetition the first loop is slower that the followings. If you run 10 or 100 loops the difference can be negligible, but if run the only one loop, the result can differs on 10% or more. > $ python3.6 -m timeit 'pass' > 100000000 loops, best of 3: 0.0339 usec per loop This is a senseless example. 0.0339 usec is not a time of executing "pass", it is an overhead of the iteration. You can't use timeit for measuring the performance of the code that takes such small time. You just can't get the reliable result for it. Even for code that takes an order larger time the result is not very reliable. Thus no need to worry about timing much less than 1 usec. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue28240> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com