On 08/09/2012 06:54 PM, Andrew Cooper wrote: > On 09/08/2012 23:26, Dave Angel wrote: >> On 08/09/2012 06:03 PM, Andrew Cooper wrote: >>> On 09/08/2012 22:34, Roman Vashkevich wrote: >>>> Actually, they are different. >>>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred >>>> thousand entries, and you will feel the difference. >>>> Dict uses hashing to get a value from the dict and this is why it's O(1). >>>> >>> Sligtly off topic, but looking up a value in a dictionary is actually >>> O(n) for all other entries in the dict which suffer a hash collision >>> with the searched entry. >>> >>> True, a sensible choice of hash function will reduce n to 1 in common >>> cases, but it becomes an important consideration for larger datasets. >>> >>> ~Andrew >> I'm glad you're wrong for CPython's dictionaries. The only time the >> lookup would degenerate to O[n] would be if the hash table had only one >> slot. CPython sensibly increases the hash table size when it becomes >> too small for efficiency. >> >> >> Where have you seen dictionaries so poorly implemented? >> > Different n, which I should have made more clear. I was using it for > consistency with O() notation. My statement was O(n) where n is the > number of hash collisions. That's a little like doing a survey, and reporting the results as showing that 100% of the women hit their husbands, among the population of women who hit their husbands.
In your original message, you already stated the assumption that a proper hash algorithm would be chosen, then went on to apparently claim that large datasets would still have an order n problem. That last is what I was challenging. The rest of your message here refers to client code, not the system. > The choice of hash algorithm (or several depending on the > implementation) should specifically be chosen to reduce collisions to > aid in efficient space utilisation and lookup times, but any > implementation must allow for collisions. There are certainly runtime > methods of improving efficiency using amortized operations. > > As for poor implementations, > > class Foo(object): > > ... > > def __hash__(self): > return 0 > > I seriously found that in some older code I had the misfortune of > reading. It didn't remain in that state for long. > > ~Andrew -- DaveA -- http://mail.python.org/mailman/listinfo/python-list