Hello,

I observed an inconsistency in the behavior of 'set' and 'dict' iterators.  It 
is "by design" according to the docs.

'''
http://docs.python.org/dev/library/stdtypes.html#dict-views

iter(dictview).  Iterating views while adding or deleting entries in the 
dictionary may raise a RuntimeError or fail to iterate over all entries.
'''

The 'set' has the same behavior.  Iteration might also complete successfully.

The inconsistency is, if we remove an element from a set and add another during 
iteration, the new element might appear later in the iteration, and might not, 
depending on the hash code; therefore comparing the size of the set between 
iterations isn't adequate.  Example:

http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.py  

'''
# py: { 'ver': '3' }

set0= set( ( 1, 2 ) )

iter0= iter( set0 )
print( next( iter0 ) )

set0.add( 3 )
set0.remove( 2 )
print( next( iter0 ) )


print( )

set0= set( ( 6, 7 ) )

iter0= iter( set0 )
print( next( iter0 ) )

set0.add( 8 )
set0.remove( 7 )
print( next( iter0 ) )
'''

Output:

'''
1
3

6
Traceback (most recent call last):
  File [...] line 22, in <module>
    print( next( iter0 ) )
StopIteration
'''

Iteration should behave the same regardless of the contents of the set.  
Continuing iteration over sets and dicts after a modification isn't defined; it 
should unconditionally raise an error.

What's going on, is '8' is added before the position of the iterator due to 
hashing in the second part, but the size doesn't change, so the iterator 
reaches the end of the set after '7' is removed.

The inconsistency isn't easily solved.  One possibility is to use a timestamp 
or other serial index in the object and iterators, and compare them on every 
iteration to determine if a modification has occurred.

Another possibility which the author prefers, is to maintain a secondary 
collection of the iterators of an object, and invalidate them upon 
modification.  The applicable collection structure is a doubly-linked linked 
list, informally depicted:

http://home.comcast.net/~castironpi-misc/clpy-0062%20set%20iterators.png

Upon modification, the set traverses its iterators, setting an 'invalid' flag 
on each; and subsequent calls to any of them raise an 'IterationError'.  Adding 
and removing iterators to and from the secondary list is performed in O( 1 ) 
time with no penalty.

The above example depicted a 'Set'.  'Dicts' have the same anomaly, but the 
solution is ambiguous, since dict values can be changed meaningfully without 
altering the structure of the object.  In the author's opinion, the dict should 
not raise an 'IterationError' on value changes, only key changes like the set, 
but the argument isn't conclusive.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to