On Sun, 28 Jul 2013 15:59:04 -0400, Roy Smith wrote: [...] > I'm rather shocked to discover that count() is the slowest > of all! I expected it to be the fastest. Or, certainly, no slower than > default(). > > The full profiler dump is at the end of this message, but the gist of it > is: > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 0.000 0.000 0.322 0.322 ./stations.py:42(count) > 1 0.159 0.159 0.159 0.159 ./stations.py:17(test) > 1 0.114 0.114 0.114 0.114 ./stations.py:27(exception) > 1 0.097 0.097 0.097 0.097 ./stations.py:36(default) > > Why is count() [i.e. collections.Counter] so slow?
It's within a factor of 2 of test, and 3 of exception or default (give or take). I don't think that's surprisingly slow. In 2.7, Counter is written in Python, while defaultdict has an accelerated C version. I expect that has something to do with it. Calling Counter ends up calling essentially this code: for elem in iterable: self[elem] = self.get(elem, 0) + 1 (although micro-optimized), where "iterable" is your data (lines). Calling the get method has higher overhead than dict[key], that will also contribute. -- Steven -- http://mail.python.org/mailman/listinfo/python-list