Dave Malcolm <dmalc...@redhat.com> added the comment: On Thu, 2012-01-26 at 21:04 +0000, Alex Gaynor wrote: > Alex Gaynor <alex.gay...@gmail.com> added the comment: > > On Thu, Jan 26, 2012 at 4:00 PM, Martin v. Löwis > <rep...@bugs.python.org>wrote: > > > > > Martin v. Löwis <mar...@v.loewis.de> added the comment: > > > > I'd like to propose an entirely different approach: use AVL trees for > > colliding strings, for dictionaries containing only unicode or byte strings. > > > > A prototype for this is in > > http://hg.python.org/sandbox/loewis/branches#avl > > It is not fully working yet, but I'm now confident that this is a feasible > > approach. > > > > It has the following advantages over the alternatives: > > - performance in case of collisions is O(log(N)), where N is the number of > > colliding keys > > - no new exceptions are raised, except for MemoryError if it runs out of > > memory for allocating nodes in the tree > > - the hash values do not change > > - the dictionary order does not change as long as no keys collide on hash > > values (which for all practical purposes should mean that the dictionary > > order does not change in all places where it matters) > > > > ---------- > > nosy: +loewis > > > > _______________________________________ > > Python tracker <rep...@bugs.python.org> > > <http://bugs.python.org/issue13703> > > _______________________________________ > > > > Martin, > > What happens if, instead of putting strings in a dictionary directly, I > have them wrapped in something. For example, the classes Antoine and I > pasted early. These define hash and equal as being strings, but don't have > an ordering.
[Obviously I'm not Martin, but his idea really interests me] Looking at: http://hg.python.org/sandbox/loewis/file/58be269aa0b1/Objects/dictobject.c#l517 as soon as any key insertion or lookup occurs where the key isn't exactly one of the correct types, the dict flattens any AVL trees back into the regular flat representation (and switches to lookdict for ma_lookup), analogous to the existing ma_lookup transition on dicts. >From my reading of the code, if you have a dict purely of bytes/str, collisions on a hash value lead to the PyDictEntry's me_key being set to an AVL tree. All users of the ma_lookup callback within dictobject.c check to see if they're getting such PyDictEntry back. If they are, they call into the tree, which leads to TREE_FIND(), TREE_INSERT() and TREE_DELETE() invocations as appropriate; ultimately, the AVL macros call back to within node_cmp(): PyObject_Compare(left->key, right->key) [Martin, I'm sorry if I got this wrong] So *if* I'm reading the code correctly, it might be possible to generalize it from {str, bytes} to any set of types within which ordering and equality checking of instances from any type is "sane", which loosely, would seem to be: that we can reliably compare all objects from any type within the set, so that we can use the comparisons to perform a search to hone in on a pair of keys that compare as "equal", without any chance of raising exceptions, or missing a valid chance for two objects to be equal etc. I suspect that you can't plug arbitrary user-defined types into it, since there's no way to guarantee that ordering and comparison work in the ways that the AVL lookup requires. But I could be misreading Martin's code. [thinking aloud, if a pair of objects don't implement comparison at the PyObject_Compare level, is it possible to instead simply compare the addresses of the objects? I don't think so, since you have a custom equality implementation in your UDT, but maybe I've missed something?] Going higher-level, I feel that there are plenty of attacks against pure-str/bytes dicts, and having protection against them is worthwhile - even if there's no direct way to use it to protect the use-case you describe. Hope this is helpful Dave ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13703> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com