On Monday, August 27, 2012 2:17:45 PM UTC-5, Ian wrote: > On Thu, Aug 23, 2012 at 10:49 AM, Aaron Brady <castiro...@gmail.com> wrote: > > > The patch for the above is only 40-60 lines. However it introduces two new > > concepts. > > > > Is there a link to the patch?
Please see below. It grew somewhat during development. > > The first is a "linked list". SNIP. > > The second is "uncounted references". The uncounted references are > > references to "set iterators" exclusively, exist only internally to "set" > > objects, and are invisible to the rest of the program. The reason for the > > exception is that iterators are unique in the Python Data Model; iterators > > consist of a single immutable reference, unlike both immutable types such > > as strings and numbers, as well as container types. Counted references > > could be used instead, but would be consistently wasted work for the > > garbage collector, though the benefit to programmers' peace of mind could > > be significant. > > > > > > Please share your opinion! Do you agree that the internal list resolves > > the inconsistency? Do you agree with the strategy? Do you agree that > > uncounted references are justified to introduce, or are counted references > > preferable? > > > > This feature is a hard sell as it is; I think that adding uncounted > > references into the mix is only going to make that worse. May I > > suggest an alternate approach? Internally tag each set or dict with a > > "version", which is just a C int. Every time the hash table is > > modified, increment the version. When an iterator is created, store > > the current version on the iterator. When the iterator is advanced, > > check that the iterator version matches the dict/set version. If > > they're not equal, raise an error. > > > > This should add less overhead than the linked list without any > > concerns about reference counting. It does introduce a small bug in > > that an error condition could be "missed", if the version is > > incremented a multiple of 2**32 or 2**64 times between iterations -- > > but how often is that really likely to occur? Bearing in mind that > > this error is meant for debugging and not production error handling, > > you could even make the version a single byte and I'd still be fine > > with that. > > > > Cheers, > > Ian Hi Ian, We could use a Python long object for the version index to prevent overflow. Combined with P. Rubin's idea to count the number of open iterators, most use cases still wouldn't exceed a single word comparison; we could reset the counter when there weren't any. Using the linked list collection, modification operations are expensive in rare cases. Using the version index, iteration is expensive in rare cases. I was more interested in the linked list for conceptual reasons, so I developed it further. Changelog, diff file, test suite, and links are below. The devs should be aware that a competing patch might be developed. I would be pleased to hear what everybody thinks of it! Linked list with uncounted references implementation for Win32. Added: - 'set_clear_ex' and 'set_clear_internal_ex' methods, differ in invalidation and conditional invalidation behavior and return type.. The 'set.clear()' method and 'tp_clear' type field both called the same method. - 'set_invalidate_iter_linked' method. Iterate over the iterators of a set, mark them invalid, and clear the list. - 'setiter_unlink_internal' method. Remove the iterator from the set's linked list of iterators. - 'IterationError', global. - New fields: -- PySetObject: setiterobject *iter_linked. Pointer to the first element of the linked list of the iterators of the set. -- setiterobject: setiterobject *linked_pred, *linked_succ. Predecessor and successor nodes in the linked list of iterators of the same set. -- setiterobject: char ob_valid. Validation status of the iterator. - Result is compared with original in 'set_intersection_update' and '_multi' to determine whether to invalidate the list of iterators. Asymptotic running time is unchanged. - Pending: add 'tp_clear' field to 'PySetIter_Type'? - Test script included, 'large numbers' test pending. 6 files changed: { setobject.h, setobject.c, exceptions.c, pyerrors.h, python3.def, python33stub.def }. Test script 'set_iterator_test.py' new. Linked list interface and pseudocode 'patch_pseudocode.txt'. Zip file: http://home.comcast.net/~castironpi-misc/clpy-0062-set_iterator_patch.zip Diff file of 3.3.0b2: http://home.comcast.net/~castironpi-misc/clpy-0062-set_iterator_diff.txt -- http://mail.python.org/mailman/listinfo/python-list