On 10/08/2010 03:27 PM, kj wrote:
I tried to understand this by looking at the C source but I gave
up after 10 fruitless minutes. (This has been invariably the
outcome of all my attempts at finding my way through the Python C
source.)
It's not you. CPython's code is ... [censored]
Anyway, you don't even need to read C code to understand how sets are
implemented.
There is a (now deprecated) Python module, Libs/set.py, that has
implementations for `Set` and `ImmutableSet` (nowadays `set` and
`frozenset`).
The implementation strategy you can see there is quite simple. The code
uses dictionary keys to store the set items and "ignores" the dictionary
values, so that `.add(value)` is implemented as `._dict[value] =
some_value_nobody_cares_about`.
Here comes a very limited example set implementation using a dict:
class PrimitiveSet(object):
def __init__(self):
self._dict = {}
def add(self, value):
self._dict[value] = True
def __contains__(self, value):
return value in self._dict
def __repr__(self):
return 'PrimitiveSet(%r)' % self._dict.keys()
>>> s = PrimitiveSet()
>>> 'hello' in s
False
>>> s.add('hello')
>>> 'hello' in s
True
>>> s
PrimitiveSet(['hello'])
>>> s.add(tuple(xrange(10)))
>>> s
PrimitiveSet([(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), 'hello'])
>>> s.add(xrange(5))
>>> s
PrimitiveSet([(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), xrange(5), 'hello'])
This has a few implications for sets:
* dict keys are unordered/sorted. so are sets.
* dict keys are unique. same for set values.
* dict keys have to be hashable (immutable). same for sets values.
So far our implementation is not hashable, and we need a custom
implementation for __hash__ (because dicts aren't hashable, so we cannot
re-use dictionary methods).
There is one requirement for set hashes: they have to be independent of
the item order (there *is* an order in memory of course, and it may vary
depending on the order assignments to our dict are done).
Here is an extract from the Python set implementation,
`BaseSet._compute_hash`:
def _compute_hash(self):
# Calculate hash code for a set by xor'ing the hash codes of
# the elements. This ensures that the hash code does not depend
# on the order in which elements are added to the set. [...]
result = 0
for elt in self:
result ^= hash(elt)
return result
Hope this helps :-)
Jonas
--
http://mail.python.org/mailman/listinfo/python-list