Equivalence is a class that can be used to maintain a partition of objects into equivalence sets, making sure that the equivalence properties (reflexivity, symmetry, transitivity) are preserved. Two objects x and y are considered equivalent either implicitly (through a key function) or explicitly by calling merge(x,y).
Get it from pypi: http://pypi.python.org/pypi/equivalence/ Example ======= Say that you are given a bunch of URLs you want to download and eventually process somehow. These urls may contain duplicates, either exact or leading to a page with the same content (e.g. redirects, plagiarized pages, etc.). What you'd like is identify duplicates in advance so that you can process only unique pages. More formally, you want to partition the given URLs into equivalence sets and pick a single representative from each set. Getting rid of identical URLs is trivial. A more general case of URLs that can be easily identified as duplicates can be based on some simple regular expression based heuristics, so that for instance 'http://python.org/doc/' and 'www.python.org/doc/index.html' are deemed equivalent. For this case you may have a normalize(url) function that reduces a URL into its "stem" (e.g. 'python.org/doc') and use this as a key for deciding equivalence. This is fine but it still leaves quite a few URLs that cannot be recognized as duplicates with simple heuristics. For these harder cases you may have one or more "oracles" (an external database, a page comparison program, or ultimately a human) that decides whether pages x and y are equivalent. You can integrate such oracles by explicitly declaring objects as equivalent using Equivalence.merge(x,y). Both implicit (key-based) and explicit information are combined to maintain the equivalence sets. For instance: >>> from equivalence import Equivalence >>> dups = Equivalence(normalize) # for an appropriate normalize(url) >>> dups.merge('http://python.org/doc/', 'http://pythondocs.com/') >>> dups.are_equivalent('www.pythondocs.com/index.htm', 'http://python.org/doc/ index.html') >>> True You can find more about the API in the included docs and the unittest file. Regards, George -- http://mail.python.org/mailman/listinfo/python-list