Hi,

Le 10/06/2015 02:15, Robert Collins a écrit :
python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.items(): pass'
10 loops, best of 3: 76.6 msec per loop

python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
d.iteritems(): pass'
100 loops, best of 3: 22.6 msec per loop

.items() is 3x as slow as .iteritems(). Hum, I don't have the same results. Try attached benchmark. I'm using my own wrapper on top of timeit, because timeit is bad at calibrating the benchmark :-/ timeit gives unreliable results.

Results on with CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:

[ 10 keys ]
713 ns: iteritems
922 ns (+29%): items

[ 10^3 keys ]
42.1 us: iteritems
59.4 us (+41%): items


[ 10^6 keys (1 million) ]
89.3 ms: iteritems
442 ms (+395%): items

In my benchmark, .items() is 5x as slow as .iteritems(). The code to iterate on 1 million items takes almost an half second. IMO adding 300 ms to each request is not negligible on an application. If this delay is added multiple times (multiple loops iterating on 1 million items), we may reach up to 1 second on an user request :-/

Anyway, when I write patches to port a project to Python 3, I don't want to touch *anything* to Python 2. The API, the performances, the behaviour, etc. must not change.

I don't want to be responsible of a slow down, and I don't feel able to estimate if replacing dict.iteritems() with dict.items() has a cost on a real application.

As Ihar wrote: it must be done in a separated patch, by developers knowning well the project.

Currently, most developers writing Python 3 patches are not heavily involved in each ported project.

There is also dict.itervalues(), not only dict.iteritems().

"for key in dict.iterkeys()" can simply be written "for key in dict:".

There is also xrange() vs range(), the debate is similar:
https://review.openstack.org/#/c/185418/

For Python 3, I suggest to use "from six.moves import range" to get the Python 3 behaviour on Python 2: range() always create an iterator, it doesn't create a temporary list. IMO it makes the code more readable because "for i in xrange(n):" becomes "for i in range(n):". six is not written outside imports and "range()" is better than "xrange()" for developers starting to learn Python.

Victor
"""
Micro-benchmark for the Python operation "key in dict". Run it with:

./python.orig benchmark.py script bench_str.py --file=orig
./python.patched benchmark.py script bench_str.py --file=patched
./python.patched benchmark.py compare_to orig patched

Download benchmark.py from:

https://bitbucket.org/haypo/misc/raw/tip/python/benchmark.py
"""
import gc

def consume_items(dico):
    for key, value in dico.items():
        pass


def consume_iteritems(dico):
    for key, value in dico.iteritems():
        pass


def run_benchmark(bench):
    for nkeys in (10, 10**3, 10**6):
        bench.start_group('%s keys' % nkeys)
        dico = {str(index): index for index in range(nkeys)}

        bench.compare_functions(
            ('iteritems', consume_iteritems, dico),
            ('items', consume_items, dico),
        )
        dico = None
        gc.collect()
        gc.collect()

if __name__ == "__main__":
    import benchmark
    benchmark.main()
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to