[EMAIL PROTECTED] wrote: > Klaas wrote: > > Benchmarks? > > There is one (fixed in a succesive post) in the original thread I was > referring to: > http://groups.google.com/group/it.comp.lang.python/browse_thread/thread/aff60c644969f9b/ > If you want I can give more of them (and a bit less silly, with strings > too, etc). <>
Sorry, I didn't see any numbers. I ran it myself and found the defaultdict version to be approximately twice as slow. This, as you suggest, is the worst case, as you are using integers as hash keys (essentially no hashing cost) and are accessing each key exactly once. > > > (and slowing down other uses of the class) > > All it has to do is to cheek if the default_factory is an int, it's > just an "if" done only once, so I don't think it slows down the other > cases significantly. Once it makes that check, surely it must check a flag or some such every time it is about to invoke the key constructor function? > > especially when the faster alternative is so easy to code. > > The faster alternative is easy to create, but the best faster > alternative can't be coded, because if you code it in Python you need > two hash accesses, while the defaultdict can require only one of them: > > if n in d: > d[n] += 1 > else: > d[n] = 1 How do you think that defaultdict is implemented? It must perform the dictionary access to determine that the value is missing. It must then go through the method dispatch machinery to look for the __missing__ method, and execute it. If you _really_ want to make this fast, you should write a custom distionary subclass which accepts an object (not function) as default value, and assigns it directly. > >If that performance difference matters, > > With Python it's usually difficult to tell if some performance > difference matters. Probably in some programs it may matter, but in > most other programs it doesn't matter. This is probably true for all > the performance tweaks I may invent in the future too. In general, I agree, but in this case it is quite clear. The only possible speed up is for defaultdict(int). The re-write using regular dicts is trivial, hence, for given piece of code is it quite clear whether the performance gain is important. This is not an interpreter-wide change, after all. Consider also that the performance gains would be relatively unsubstantial when more complicated keys and a more realistic data distribution is used. Consider further that the __missing__ machinery would still be called. Would the resulting construct be faster than the use of a vanilla dict? I doubt it. But you can prove me wrong by implementing it and benchmarking it. > > you would likely find more fruitful > > gains in coding it in c, using PyDict_SET > > I've just started creating a C lib for related purposes, I'd like to > show it to you all on c.l.p, but first I have to find a place to put it > on :-) (It's not easy to find a suitable place, it's a python + c + > pyd, and it's mostly an exercise). Would suggesting a webpage be too trite? -Mike -- http://mail.python.org/mailman/listinfo/python-list