On 3/30/18 6:41 AM, bartc wrote:
On 27/03/2018 04:49, Richard Damon wrote:
On 3/26/18 8:46 AM, bartc wrote:
Hence my testing with CPython 3.6, rather than on something like
PyPy which can give results that are meaningless. Because, for
example, real code doesn't repeatedly execute the same pointless
fragment millions of times. But a real context is too complicated to
set up.
The bigger issue is that these sort of micro-measurements aren't
actually that good at measuring real quantitative performance costs.
They can often give qualitative indications, but the way modern
computers work, processing environment is extremely important in
performance, so these sorts of isolated measure can often be
misleading. The problem is that if you measure operation a, and then
measure operation b, if you think that doing a then b in the loop
that you will get a time of a+b, you will quite often be
significantly wrong, as cache performance can drastically affect
things. Thus you really need to do performance testing as part of a
practical sized exercise, not a micro one, in order to get a real
measurement.
That might apply to native code, where timing behaviour of a
complicated chip like x86 might be unintuitive.
But my comments were specifically about byte-code executed with
CPython. Then the behaviour is a level or two removed from the
hardware and with slightly different characteristics.
(Since the program you are actually executing is the interpreter, not
the Python program, which is merely data. And whatever aggressive
optimisations are done to the interpreter code, they are not affected
by the Python program being run.)
But cache behavior may very well still influence it, as a small section
of byte code may only exercise a small part of the interpreter, and thus
it might be able to all (or mostly)) live in cache, and thus run faster,
while a broader program, uses more of the interpreter, and may no longer
fit in the cache. In some ways, this can be much amplified over a fully
compiled code as very small changes in byte code can have much bigger
effects over what gets accessed. You probably do get less opportunity
for things to speed up by combining pieces, but still plenty of
opportunity to get slowdowns.
Another factor that you run into is that lookup time can be a factor,
just the mere presence of lots of other code in the test module, even if
not executing, can impact the speed it runs at.
--
Richard Damon
--
https://mail.python.org/mailman/listinfo/python-list