Hi,
Le 10/06/2015 02:15, Robert Collins a écrit :
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 76.6 msec per loop
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.iteritems(): pass'
100 loops, best of 3: 22.6 msec per loop
.items() is 3x as slow as .iteritems(). Hum, I don't have the same
results. Try attached benchmark. I'm using my own wrapper on top of
timeit, because timeit is bad at calibrating the benchmark :-/ timeit
gives unreliable results.
Results on with CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:
[ 10 keys ]
713 ns: iteritems
922 ns (+29%): items
[ 10^3 keys ]
42.1 us: iteritems
59.4 us (+41%): items
[ 10^6 keys (1 million) ]
89.3 ms: iteritems
442 ms (+395%): items
In my benchmark, .items() is 5x as slow as .iteritems(). The code to
iterate on 1 million items takes almost an half second. IMO adding 300
ms to each request is not negligible on an application. If this delay is
added multiple times (multiple loops iterating on 1 million items), we
may reach up to 1 second on an user request :-/
Anyway, when I write patches to port a project to Python 3, I don't want
to touch *anything* to Python 2. The API, the performances, the
behaviour, etc. must not change.
I don't want to be responsible of a slow down, and I don't feel able to
estimate if replacing dict.iteritems() with dict.items() has a cost on a
real application.
As Ihar wrote: it must be done in a separated patch, by developers
knowning well the project.
Currently, most developers writing Python 3 patches are not heavily
involved in each ported project.
There is also dict.itervalues(), not only dict.iteritems().
"for key in dict.iterkeys()" can simply be written "for key in dict:".
There is also xrange() vs range(), the debate is similar:
https://review.openstack.org/#/c/185418/
For Python 3, I suggest to use "from six.moves import range" to get the
Python 3 behaviour on Python 2: range() always create an iterator, it
doesn't create a temporary list. IMO it makes the code more readable
because "for i in xrange(n):" becomes "for i in range(n):". six is not
written outside imports and "range()" is better than "xrange()" for
developers starting to learn Python.
Victor
"""
Micro-benchmark for the Python operation "key in dict". Run it with:
./python.orig benchmark.py script bench_str.py --file=orig
./python.patched benchmark.py script bench_str.py --file=patched
./python.patched benchmark.py compare_to orig patched
Download benchmark.py from:
https://bitbucket.org/haypo/misc/raw/tip/python/benchmark.py
"""
import gc
def consume_items(dico):
for key, value in dico.items():
pass
def consume_iteritems(dico):
for key, value in dico.iteritems():
pass
def run_benchmark(bench):
for nkeys in (10, 10**3, 10**6):
bench.start_group('%s keys' % nkeys)
dico = {str(index): index for index in range(nkeys)}
bench.compare_functions(
('iteritems', consume_iteritems, dico),
('items', consume_items, dico),
)
dico = None
gc.collect()
gc.collect()
if __name__ == "__main__":
import benchmark
benchmark.main()
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev