[EMAIL PROTECTED] wrote: > This post sums some things I have written in another Python newsgroup. > More than 40% of the times I use defaultdict like this, to count > things: > > >>> from collections import defaultdict as DD > >>> s = "abracadabra" > >>> d = DD(int) > >>> for c in s: d[c] += 1 > ... > >>> d > defaultdict(<type 'int'>, {'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1}) > > But I have seen that if keys are quite sparse, and int() becomes called > too much often, then code like this is faster: > > >>> d = {} > >>> for c in s: > ... if c in d: d[c] += 1 > ... else: d[c] = 1 > ... > >>> d > {'a': 5, 'r': 2, 'b': 2, 'c': 1, 'd': 1} > > So to improve the speed for such special but common situation, the > defaultdict can manage the case with default_factory=int in a different > and faster way.
Benchmarks? I doubt it is worth complicating defaultdict's code (and slowing down other uses of the class) for this improvement... especially when the faster alternative is so easy to code. If that performance difference matters, you would likely find more fruitful gains in coding it in c, using PyDict_SET. -Mike -- http://mail.python.org/mailman/listinfo/python-list