On Tuesday 02 February 2016 06:32, Sven R. Kunze wrote: > On 31.01.2016 02:48, Steven D'Aprano wrote: >> On Sunday 31 January 2016 09:47, Sven R. Kunze wrote: >> >>> @all >>> What's the best/standardized tool in Python to perform benchmarking? >> timeit > Thanks, Steven. > > Maybe, I am doing it wrong but I get some weird results:
You need to understand that in any modern desktop, server or laptop computer (embedded devices may be different) there is *constantly* a large amount of processing going on in the background. Your OS is constantly responding to network events, managing memory, skipping from one process to another, scheduling tasks; your anti-virus and other background programs are working; your window environment is watching the mouse, etc. So each time you run a test, it comes down to pure dumb luck how many other programs are busy at the same time. (You can, sometimes, influence that by quitting all other applications, unplugging the network cable, and keeping your hands off the keyboard and mouse while the test is running. But who does that?) All that adds noise to the measured times. It's not unusual for timings to differ by a factor of ten from one run to the next. The speed will also differ depending on processor cache misses, and many other factors that are effectively impossible to predict with any certainty. This makes timing measurements on modern systems an inherently noisy process. In effect, each measurement you take is made up of two components: * the actual time that the code would take if it had exclusive access to the machine with no other programs running, call it t; * and the noise added by the system, call it Δt. It is impossible to measure t alone, you always get t+Δt. The aim in timing is to reduce the effect of the noise as much as possible. The way to do this with timeit is: - don't run extraneous code as part of your test (just measure what you care about, with as little scaffolding around it); - run that code snippet as many times as you can bear in a loop, *but* let timeit manage the loop; don't try doing it by hand; - only care about "best of N", where N is as large a number as you can stand; - averages, standard deviation, and other statistics are pretty much useless here, the only statistic that you care about is the minimum. Out of those four guidelines, you missed three of them: (1) Your test code includes scaffolding, the "for _ in range..." loop. You're timing how expensive it is to build a range object and loop over it. (2) You picked a tiny number for the number of loops: ten. Timeit defaults to one million, which is good for *really* fast and tiny snippets. I normally reduce that to 10000, but run a larger number of trials. If the timing from each trial is too small, I increase the number of loops, and if it takes too long (I am impatient) I decrease it, but never below 100. (3) You then use the repeat() method to calculate the "best of one", which is pretty much pointless. There is a timeit() method for "best of one", if that's ever useful. I normally run 5 or 7 trials, and report only the best (smallest). Here's your code: > >>> min(timeit.Timer('for _ in range(10000): heappop(h)', 'from heapq > import heappop; h=list(range(10000000))').repeat(10, 1)), > min(timeit.Timer('for _ in range(10000): h.pop()', 'from xheap import > Heap; h=Heap(range(10000000))').repeat(10, 1)) > (0.017267618000005314, 0.01615345600021101) Your code would be easier to read and understand if it were split over multiple lines. This is not Perl and there's no advantage to one-liners. Pulling your code apart and re-writing it in a way which I feel is more understandable: t1 = Timer( 'for _ in range(10000): heappop(h)', # code being tested 'from heapq import heappop; h=list(range(10000000))' # setup ) t2 = Timer( 'for _ in range(10000): h.pop()', 'from xheap import Heap; h=Heap(range(10000000))' ) min(t1.repeat(10, 1)) min(t2.repeat(10, 1)) Something like this will probably be less noisy: setup = """ from heapq import heappop from xheap import Heap a = list(range(10000000)) h = Heap(a) """ t1 = Timer("heappop(a)", setup) t2 = Timer("h.pop()", setup) # print best of 7 trials, where each trial runs the code snippet 10000 times print(min(t1.repeat(10000, 7))) print(min(t2.repeat(10000, 7))) The times printed will be in seconds per 10000 runs; divide it by 10 to get the time in *milliseconds* per run. Note that this will *not* eliminate all noise or variation from one timing test to the other, but it will (I hope!) reduce it. If you're still seeing a lot of variation, try turning the garbage collector off and quitting as many other applications as possible. If anyone else can suggest any other techniques for reducing noise, I'd love to hear about them. Also note that times generated by timeit with one version of Python may not be measuring the same thing as those using a different version. I believe that Python 3.4 and better will do a better job of measuring only the time the code snippet process is active, rather than wall-time. But that's also operating system dependent. > How can it be that my wrapper is sometimes faster and sometimes slower > than heapq? I wouldn't mind slower, but faster*? Maybe your code genuinely is "about as fast" as heapq, so the differences you're seeing are just due to noise in the system. -- Steve -- https://mail.python.org/mailman/listinfo/python-list