On 10/08/2010 03:27 PM, kj wrote:
I tried to understand this by looking at the C source but I gave
up after 10 fruitless minutes.  (This has been invariably the
outcome of all my attempts at finding my way through the Python C
source.)

It's not you. CPython's code is ... [censored]

Anyway, you don't even need to read C code to understand how sets are implemented.

There is a (now deprecated) Python module, Libs/set.py, that has implementations for `Set` and `ImmutableSet` (nowadays `set` and `frozenset`).

The implementation strategy you can see there is quite simple. The code uses dictionary keys to store the set items and "ignores" the dictionary values, so that `.add(value)` is implemented as `._dict[value] = some_value_nobody_cares_about`.

Here comes a very limited example set implementation using a dict:

class PrimitiveSet(object):
    def __init__(self):
        self._dict = {}

    def add(self, value):
        self._dict[value] = True

    def __contains__(self, value):
        return value in self._dict

    def __repr__(self):
        return 'PrimitiveSet(%r)' % self._dict.keys()

>>> s = PrimitiveSet()
>>> 'hello' in s
False
>>> s.add('hello')
>>> 'hello' in s
True
>>> s
PrimitiveSet(['hello'])
>>> s.add(tuple(xrange(10)))
>>> s
PrimitiveSet([(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), 'hello'])
>>> s.add(xrange(5))
>>> s
PrimitiveSet([(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), xrange(5), 'hello'])

This has a few implications for sets:
* dict keys are unordered/sorted. so are sets.
* dict keys are unique. same for set values.
* dict keys have to be hashable (immutable). same for sets values.

So far our implementation is not hashable, and we need a custom implementation for __hash__ (because dicts aren't hashable, so we cannot re-use dictionary methods). There is one requirement for set hashes: they have to be independent of the item order (there *is* an order in memory of course, and it may vary depending on the order assignments to our dict are done).

Here is an extract from the Python set implementation, `BaseSet._compute_hash`:

def _compute_hash(self):
    # Calculate hash code for a set by xor'ing the hash codes of
    # the elements.  This ensures that the hash code does not depend
    # on the order in which elements are added to the set. [...]
    result = 0
    for elt in self:
        result ^= hash(elt)
    return result

Hope this helps :-)

Jonas
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to