I've been doing an informal "intro to Python" lunchtime series for some co-workers (who are all experienced programmers, in other languages). This week I was going to cover list comprehensions, exceptions, and profiling. So, I did a little demo showing different ways to build a dictionary counting how many times a string appears in some input:
test() implements a "look before you leap" python loop exception() uses a try/catch construct in a similar python loop default() uses a defaultdict count() uses a Counter I profiled it, to show how the profiler works. The full code is below. The input is an 8.8 Mbyte file containing about 570,000 lines (11,000 unique strings). Python 2.7.3 on Ubuntu Precise. As I expected, test() is slower than exception(), which is slower than default(). I'm rather shocked to discover that count() is the slowest of all! I expected it to be the fastest. Or, certainly, no slower than default(). The full profiler dump is at the end of this message, but the gist of it is: ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.322 0.322 ./stations.py:42(count) 1 0.159 0.159 0.159 0.159 ./stations.py:17(test) 1 0.114 0.114 0.114 0.114 ./stations.py:27(exception) 1 0.097 0.097 0.097 0.097 ./stations.py:36(default) Why is count() [i.e. collections.Counter] so slow? ------------------------------------------------------------- from collections import defaultdict, Counter def main(): lines = open('stations.txt').readlines() d1 = test(lines) d2 = exception(lines) d3 = default(lines) d4 = count(lines) print d1 == d2 print d1 == d3 print d1 == d4 def test(lines): d = {} for station in lines: if station in d: d[station] += 1 else: d[station] = 1 return d def exception(lines): d = {} for station in lines: try: d[station] += 1 except KeyError: d[station] = 1 return d def default(lines): d = defaultdict(int) for station in lines: d[station] += 1 return d def count(lines): d = Counter(lines) return d if __name__ == '__main__': import cProfile import pstats cProfile.run('main()', 'stations.stats') p = pstats.Stats('stations.stats') p.sort_stats('cumulative').print_stats() ------------------------------------------------------------- 570335 function calls (570332 primitive calls) in 0.776 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.023 0.023 0.776 0.776 <string>:1(<module>) 1 0.006 0.006 0.753 0.753 ./stations.py:5(main) 1 0.000 0.000 0.322 0.322 ./stations.py:42(count) 1 0.000 0.000 0.322 0.322 /usr/lib/python2.7/collections.py:407(__init__) 1 0.242 0.242 0.322 0.322 /usr/lib/python2.7/collections.py:470(update) 1 0.159 0.159 0.159 0.159 ./stations.py:17(test) 1 0.114 0.114 0.114 0.114 ./stations.py:27(exception) 1 0.097 0.097 0.097 0.097 ./stations.py:36(default) 570285 0.080 0.000 0.080 0.000 {method 'get' of 'dict' objects} 1 0.055 0.055 0.055 0.055 {method 'readlines' of 'file' objects} 1 0.000 0.000 0.000 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/abc.py:128(__instancecheck__) 2/1 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/abc.py:148(__subclasscheck__) 3/1 0.000 0.000 0.000 0.000 {issubclass} 4 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:58(__iter__) 1 0.000 0.000 0.000 0.000 {open} 2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:26(__exit__) 2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:20(__enter__) 2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:36(__init__) 3 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:68(__contains__) 2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:52(_commit_removals) 2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:81(add) 3 0.000 0.000 0.000 0.000 {getattr} 2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_weakrefset.py:16(__init__) 2 0.000 0.000 0.000 0.000 {method 'remove' of 'set' objects} 2 0.000 0.000 0.000 0.000 {method '__subclasses__' of 'type' objects} 2 0.000 0.000 0.000 0.000 /home/roy/deploy/current/python/lib/python2.7/_abcoll.py:97(__subclasshook__) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 4 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} -- http://mail.python.org/mailman/listinfo/python-list