New submission from Andrew Dalke:

The file iterator is "deemed broken". As I don't think it should be made 
non-broken, I suggest the documentation should be changed to point out when 
file iteration is broken. I also think the term 'broken' is a label with 
needlessly harsh connotations and should be softened.

The iterator documentation uses the term 'broken' like this (quoting here from

  Once an iterator’s __next__() method raises StopIteration,
  it must continue to do so on subsequent calls. Implementations
  that do not obey this property are deemed broken.

(Older versions comment "This constraint was added in Python 2.3; in Python 
2.2, various iterators are broken according to this rule.")

An IOBase is supposed to support the iterator protocol (says ). However, it does not, 
nor does the documentation say that it's broken in the face of a changing file 
(eg, when another process appends to a log file).

  % ./python.exe 
  Python 3.5.0a1+ (default:4883f9046b10, Feb 11 2015, 04:30:46) 
  [GCC 4.8.4] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> f = open("empty")
  >>> next(f)
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
  >>> ^Z
  % echo "Hello!" >> empty
  % fg

  >>> next(f)

This is apparently well-known behavior, as I've come across several references 
to it on various Python-related lists, including this one from Miles in 2008: .

  Strictly speaking, file objects are broken iterators:

Fredrik Lundh in the same thread ( ) says:

  it's a design guideline, not an absolute rule

The 7+ years of 'broken' behavior in Python suggests that /F is correct. But 
while 'broken' could be considered a meaningless label, it carries with it some 
rather negative connotations. It sounds like developers are supposed to make 
every effort to avoid broken code, when that's not something Python itself 
does. It also means that my code can be called "broken" solely because it 
assumed Python file iterators are non-broken. I am not happy when people say my 
code is broken.

It is entirely reasonable that a seek(0) would reset the state and cause 
next(it) to not continue to raise a StopIteration exception. However, errors 
can arise when using Python file objects, as an iterator, to parse a log file 
or any other files which are appended to by another process.

Here's an example of code that can break. It extracts the first and last 
elements of an iterator; more specifically, the first and last lines of a file. 
If there are no lines it returns None for both values; and if there's only one 
line then it returns the same line as both values.

  def get_first_and_last_elements(it):
    first = last = next(it, None)
    for last in it:
    return first, last

This code expects a non-broken iterator. If passed a file, and the file were 1) 
initially empty when the next() was called, and 2) appended to by the time 
Python reaches the for loop, then it's possible for first value to be None 
while last is a string.

This is unexpected, undocumented, and may lead to subtle errors.

There are work-arounds, like ensuring that the StopIteration only occurs once:

  def get_first_and_last_elements(it):
    first = last = next(it, None)
    if last is not None:
        for last in it:
    return first, last

but much existing code expects non-broken iterators, such as the Python example 
implementation at . (I have 
a reproducible failure using it, a fork(), and a file iterator with a sleep() 
if that would prove useful.)

Another option is to have a wrapper around file object iterators to keep 
raising StopIteration, like:

   def safe_iter(it):
       yield from it

   # -or-  (line for line in file_iter)

but people need to know to do this with file iterators or other potentially 
broken iterators. The current documentation does not say when file iterators 
are broken, and I don't know which other iterators are also broken.

I realize this is a tricky issue.

I don't think it's possible now to change the file's StopIteration behavior. I 
expect that there is code which depends on the current brokenness, the ability 
to seek() and re-iterate is useful, and the idea that next() returns text if 
and only if readline() is not empty is useful and well-entrenched. Pypy has the 
same behavior as CPython so any change will take some time to propagate to the 
other implementations.

Instead, I'm fine with a documentation change in io.html . It currently says:

  IOBase (and its subclasses) support the iterator protocol,
  meaning that an IOBase object can be iterated over yielding
  the lines in a stream. Lines are defined slightly differently
  depending on whether the stream is a binary stream (yielding
  bytes), or a text stream (yielding unicode strings). See
  readline() below.

I suggest adding something like:

  The file iterator does not completely follow the iterator protocol.
  If new data is added to the file after the iterator raises
  a StopIteration then next(file) will resume returning lines.
  The safest way to iterate over lines in a log file or other
  changing file is use a generator comprehension:

     (line for line in file)

  The iterator may also resume after using seek() to move
  the file position.

You'll note that I failed to use the term "broken". This should really start

   The file iterator is broken.

I find that term rather harsh, and since broken iterators are acceptable in 
Python, I suggest toning down or qualifying the use of "broken" in 
stdtypes.html. I have no suggestions for an improved version.

assignee: docs@python
components: Documentation
messages: 235850
nosy: dalke, docs@python
priority: normal
severity: normal
status: open
title: file iterator "deemed broken"; can resume after StopIteration
versions: Python 3.5

Python tracker <>
Python-bugs-list mailing list

Reply via email to