On 01/18/2014 09:51 AM, Keith Winston wrote:
I don't really get iterators. I saw an interesting example on
Stackoverflow, something like

with open('workfile', 'r') as f:
     for a, b, c in zip(f, f, f):
....

And this iterated through a, b, c assigned to 3 consecutive lines of
the file as it iterates through the file. I can sort of pretend that
makes sense, but then I realize that other things that I thought were
iterators aren't (lists and the range function)... I finally succeeded
in mocking this up with a generator:

gen = (i for i in range(20))
for t1, t2, t3 in zip(gen, gen, gen):
     print(t1, t2, t3)

So I'm a little more confident of this... though I guess there's some
subtlety of how zip works there that's sort of interesting. Anyway,
the real question is, where (why?) else do I encounter iterators,
since my two favorite examples, aren't... and why aren't they, if I
can iterate over them (can't I? Isn't that what I'm doing with "for
item in list" or "for index in range(10)")?

An iterator is a kind of object that delivers items once at a time. It is to be used with python's "for ... in ..." construct.

Concretely, for each pass of such 'for' cycle, python calls the iterator's __next__ method. If the call returns an item, it is used in the pass; if the call raises StopIteration, then the cycle stops. Here are two examples of iterators (first ignore the __iter__ method, see below) and their usage:

======================================================
class Cubes:
    def __init__ (self, max):
        self.num = 0
        self.max = max

    def __next__ (self):
        if self.num > self.max:
            raise StopIteration()
        item = self.num * self.num * self.num
        self.num += 1
        return item

    def __iter__ (self):
        return self

cubes9 = Cubes(9)

for cube in cubes9:
    print(cube, end=' ')
print()

class Odds:
    def __init__ (self, lst):
        self.idx = 0
        self.lst = lst

    def __next__ (self):
        # find next odd item, if any:
        while self.idx < len(self.lst):
            item = self.lst[self.idx]
            self.idx += 1
            if item % 2 == 1:
                return item
        # if none:
        raise StopIteration()

    def __iter__ (self):
        return self

l = [0,1,2,3,4,5,6,7,8,9,10]
odds = Odds(l)
for odd in odds:
    print(odd, end=' ')
print()
======================================================

As you can see, the relevant bit is the __next__ method. This and __iter__ are the 2 slots forming the "iterator protocol", that iterators are required to conform with.

There is a little subtlety: sequences like lists are not iterators. For users to be able to iterate over sequences like lists, directly, *in code*:
        for item in lst:
instead of:
        for item in iter(lst):
python performs a little magic: if the supposed iterator passed (here lst) is not an iterator in fact, then python looks for an __iter__ method in it, calls it if found, and if this returns an iterator (respecting the iterator protocal), then it uses that iterator instead. This is why actual iterators are required to also have an __iter__ method, so that iterators and sequences can be used in 'for' loops indifferently. Since iterators are iterators, __iter__ just returns self in their case.

Exercise: simulate python's iterator magic for lists. Eg make a 'List' type (subtype of list) and implement its __iter__ method. This should create an iterator object of type, say, ListIter which itself implements the iterator protocal, and indeed correctly provides the list's items. (As you may guess, it is a simpler version of my Odd type above.) (Dunno how to do that for sets or dicts, since on the python side we have no access I know of to their actual storage of items/pairs. In fact, this applies to lists as well, but indexing provides indirect access.)

[Note, just to compare: in Lua, this little magic making builtin sequences special does not exist. So, to iterate over all items or pairs of a Lua table, one would write explicitely, resp.:
        for key,val in pairs(t)
        for item in ipairs(t)
where pairs & ipairs resp. create iterators for (key,val) pairs or indexed items of a table (used as python lists or dicts). Functions pairs & ipairs are builtin, but it's also trivial to make iterators (or generators) in Lua, since it has 'free' objects we don't even need classes for that.]

Now, one may wonder why sequences don't implement the iterator protocal themselves (actually, just __next__) and get rid of all that mess? Well, this mess permits: * a variety of traversal, with corresponding different iterators, for the *same* (kind of) collections; for instance traversing a list backward, traversing trees breadth-first or depth-first or only their leaves, or only nodes with elements... * the same collection to be traversed in several loops at once (rarely needed, but still); concretely nested loops (in principle also from multiple threads concurrently); and this does not break (as long as the list itself remains unchanged)

Now, you may imagine that, since there are builtin iterators for all of python's "iteratable" types, and the corresponding magic is also builtin, and custom types are constructed from builtin ones, then there is rarely a need for making custom iterators and mastering the corresponding lower-level functioning. And you'd certainly be right ;-)

Why do you want to explore that, now?

Denis

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to