On 21/11/2016 14:50, Steve D'Aprano wrote:
On Mon, 21 Nov 2016 11:09 pm, BartC wrote:

Modern machines run multi-tasking operating systems, where there can be
other processes running. Depending on what you use as your timer, you may
be measuring the time that those other processes run. The OS can cache
frequently used pieces of code, which allows it to run faster. The CPU
itself will cache some code.

You get to know after while what kinds of processes affect timings. For example, streaming a movie at the same time. So when you need to compare timings, you turn those off.

The shorter the code snippet, the more these complications are relevant. In
this particular case, we can be reasonably sure that the time it takes to
create a list range(10000) and the overhead of the loop is *probably* quite
a small percentage of the time it takes to perform 100000 vector
multiplications. But that's not a safe assumption for all code snippets.

Yes, it was one of those crazy things that Python used to have to do, creating a list of N numbers just in order to be able to count to N.

But that's not significant here. Either experience, or a preliminary test with an empty loop, or using xrange, or using Py3, will show that the loop overheads for N iterations in this case are small in comparison to executing the bodies of the loops.

This is why the timeit module exists: to do the right thing when it matters,
so that you don't have to think about whether or not it matters. The timeit
module works really really hard to get good quality, accurate timings,
minimizing any potential overhead.

The timeit module automates a bunch of tricky-to-right best practices for
timing code. Is that a problem?

The problem is it substitutes a bunch of tricky-to-get-right options and syntax which has to to typed /at the command line/. And you really don't want to have to write code at the command line (especially if sourced from elsewhere, which means you have to transcribe it).


But if you prefer doing it "old school" from within Python, then:

from timeit import Timer
t = Timer('np.cross(x, y)',  setup="""
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
""")

# take five measurements of 100000 calls each, and report the fastest
result = min(t.repeat(number=100000, repeat=5))/100000
print(result)  # time in seconds per call

Better?

A bit, but the code is now inside a string!

Code will normally exist as a proper part of a module, not on the command line, in a command history, or in a string, so why not test it running inside a module?

But I've done a lot of benchmarking and actually measuring execution time is just part of it. This test I ran from inside a function for example, not at module-level, as that is more typical.

Are the variables inside a time-it string globals or locals? It's just a lot of extra factors to worry about, and extra things to get wrong.

The loop timings used by the OP showed one took considerably longer than the other. And that was confirmed by others. There's nothing wrong with that method.

--
Bartc

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to