On 27 December 2010 22:05, Ulrich Eckhardt <dooms...@knuut.de> wrote:
> What I'm now considering is to only allow a single instance of these > objects > for each set of values, similar to interned strings. What I would gain is > that I could safely compare objects for identity instead of equality. What > I'm not yet sure is how much overhead that would give me and/or how to keep > it low. The idea is to store each instance in a set and after creating a > new > object I would first look up an equal object in the global set and return > that instead, otherwise add the new one. > > The problem I foresee is that if I define equality as identity, this lookup > when creating will never eliminate duplicates. If I only fall back to > equality comparison for non-identical objects, I would probably sacrifice > most of the gain. If I build a dict mapping between the values and the > actual objects, I would have doubled the required memory and uselessly > store > the same values twice there. > The first thing to deal with the equality check. The way this is generally done is to first do an identity check, then if that fails fall back to an equality check. This gives you a fast path for the normal case, but still gives full equality checks on a slow path. Your assumption of double storage for a dict is somewhat flawed if I understand you correctly. The mapping: (value1, value2, ...) => my_object(value1, value2, ...) *could* result in value1, value2, ... being created and stored twice (hence the possibility of double storage) and the mapping tuple being stored + your object. However, if the key and value are the same object, there is only a single additional reference being stored (within the dict structure of course). The way you should probably deal with this is to always create one of your objects for doing the lookup. Then your algorithm is: new_object = my_object(value1, value2, ...) try: canonical = canonical_dict[new_object] except KeyError: canonical = canonical_dict[new_object] = new_object You'd have to structure your __new__ appropriately to do it there, but it is possible assuming that everything you need for equality testing is done in __new__. If you further want to reduce storage (if it's an issue) you could also canonicalise the values themselves using a similar technique. You could even use the same canonicalisation dictionary so long as you could ensure that none of the different types compare equal (e.g. floats and integers). Note that as an implementation detail the integers -5...256 are already interned, but you can't rely on that (the range has changed over time). Tim Delaney
-- http://mail.python.org/mailman/listinfo/python-list