Hi, I wrote simple dummy examples to teach to myself iterators and generators.
I think that the iteration protocol brings a very neat confusion between the iterator and what it iterates upon (i.e. the iteratable). This is outlined in "example 3" in my dummy examples. What are your feelings about it ? Regards Francis Girard ==== BEGINNING OF EXAMPLES ==== from itertools import tee, imap, izip, islice import sys import traceback sEx1Doc = \ """ ================================================================================ Example 1: ================================================================================ An ""iteratable"" class is a class supporting the __iter__ method which should return an ""iterator"" instance, that is, an instance of a class supporting the "next" method. An iteratable, strictly speaking, doesn't have to support the "next" method and an "iterator" doesn't have to support the "__iter__" method (but this breaks the iteration protocol as we will later see). The ""for ... in ..."" construct now expect an iteratable instance in its second argument place. It first invoke its __iter__ method and then repeatedly invoke the ""next"" method on the object returned by ""__iter__"" until the ""StopIteration"" exception is raised. The designing goal is to cleanly separate a container class with the way we iterate over its internal elements. """ class Ex1_IteratableContClass: def __init__(self): print "Ex1_IteratableContClass.__init__" self._vn = range(0,10) def elAt(self, nIdx): print "Ex1_IteratableContClass.elAt" return self._vn[nIdx] def __iter__(self): print "Ex1_IteratableContClass.__iter__" return Ex1_IteratorContClass(self) class Ex1_IteratorContClass: def __init__(self, cont): print "Ex1_IteratorContClass.__init__" self._cont = cont self._nIdx = -1 def next(self): print "Ex1_IteratorContClass.next" self._nIdx += 1 try: return self._cont.elAt(self._nIdx) except IndexError: raise StopIteration print print sEx1Doc print for n in Ex1_IteratableContClass(): print n, sys.stdout.flush() sEx2Doc = \ """ ================================================================================ Example 2: ================================================================================ Let's say that we want to give two ways to iterate over the elements of our example container class. The default way is to iterate in direct order and we want to add the possibility to iterate in reverse order. We simply add another ""iterator"" class. We do not want to modify the container class as, precisely, the goal is to decouple the container from iteration. """ class Ex2_IteratableContClass: def __init__(self): print "Ex2_IteratableContClass.__init__" self._vn = range(0,10) def elAt(self, nIdx): print "Ex2_IteratableContClass.elAt" return self._vn[nIdx] def __iter__(self): print "Ex2_IteratableContClass.__iter__" return Ex1_IteratorContClass(self) class Ex2_RevIteratorContClass: def __init__(self, cont): print "Ex2_RevIteratorContClass.__init__" self._cont = cont self._nIdx = 0 def next(self): print "Ex2_RevIteratorContClass.next" self._nIdx -= 1 try: return self._cont.elAt(self._nIdx) except IndexError: raise StopIteration print print sEx2Doc print print "Default iteration works as before" print for n in Ex2_IteratableContClass(): print n, sys.stdout.flush() print print "But reverse iterator fails with an error : " print cont = Ex2_IteratableContClass() try: for n in Ex2_RevIteratorContClass(cont): print n, sys.stdout.flush() except: traceback.print_exc() sEx3Doc = \ """ ================================================================================ Example 3. ================================================================================ The previous example fails with the "iteration over non sequence" error because the "Ex2_RevIteratorContClass" iterator class doesn't support the __iter__ method. We therefore have to supply one, even at the price of only returning ""self"". I think this is ugly because it baffles the distinction between iterators and iteratables and we artificially have to add an ""__iter__"" method to the iterator itself, which should return ... well, an iterator, which it already is ! I presume that the rationale behind this is to keep the feature that the second argument place of the ""for ... in ..."" should be filled by a container (i.e. an iteratable), not an iterator. Another way that Python might have done this without breaking existing code is to make the ""for ... in ..."" construct invoke the __iter__ method if it exists, otherwise, directly call the ""next"" method on the supplied object. But this is only a minor quirk as the decoupling of the iterator from the iteratable is nonetheless achieved at the (small) price of adding an "almost" useless method. So here it is. """ class Ex3_RevIteratorContClass: def __init__(self, cont): print "Ex3_RevIteratorContClass.__init__" self._cont = cont self._nIdx = 0 def __iter__(self): print "Ex3_RevIteratorContClass.__iter__" return self ## zut ! def next(self): print "Ex3_RevIteratorContClass.next" self._nIdx -= 1 try: return self._cont.elAt(self._nIdx) except IndexError: raise StopIteration print print sEx3Doc print cont = Ex2_IteratableContClass() for n in Ex3_RevIteratorContClass(cont): print n, sys.stdout.flush() sEx4Doc = \ """ ================================================================================ Example 4. ================================================================================ Since version 2.2, a new built-in function is offered that simply takes an iteratable as argument and returns an iterator by calling its "__iter__". As long as only iteratables/iterators are involved, this is no big deal as ""iter(anIteratable)"" is strictly equivalent to ""anIteratable.__iter__()"". The real advantage in using this function is that is also accepts containers supporting the sequence protocol (the __getitem__() method with integer arguments starting at 0). The ""iter"" methods also returns an iterator for this case. We can therefore write a function accepting a container as argument that, with the same code, can either support the iteration protocol or the sequence protocol. """ class Ex4_IteratableContClass: def __init__(self): print "Ex4_IteratableContClass.__init__" self._vn = range(0,10) def elAt(self, nIdx): print "Ex4_IteratableContClass.elAt" return self._vn[nIdx] def __iter__(self): print "Ex4_IteratableContClass.__iter__" return Ex4_IteratorContClass(self) class Ex4_IteratorContClass: def __init__(self, cont): print "Ex4_IteratorContClass.__init__" self._cont = cont self._nIdx = -1 def __iter__(self): print "Ex4_RevIteratorContClass.__iter__" return self ## zut ! def next(self): print "Ex4_IteratorContClass.next" self._nIdx += 1 try: return self._cont.elAt(self._nIdx) except IndexError: raise StopIteration class Ex4_SequenceContClass: def __init__(self): print "Ex4_SequenceContClass.__init__" self._vn = range(0,10) def elAt(self, nIdx): print "Ex4_SequenceContClass.elAt" return self._vn[nIdx] def __getitem__(self, key): print "Ex4_IteratableContClass.__getitem__" return self.elAt(key) def Ex4_FuncPrintContainerEl(cont): for n in iter(cont): print n, sys.stdout.flush() print print sEx4Doc print print print "Applying Ex4_FuncPrintContainerEl to an iteratable container" print Ex4_FuncPrintContainerEl(Ex4_IteratableContClass()) print print "Applying Ex4_FuncPrintContainerEl to a sequence container" print Ex4_FuncPrintContainerEl(Ex4_SequenceContClass()) sEx5Doc = \ """ ================================================================================ Example 5. ================================================================================ Introducing Generators The "generator" term itself is frequently used with two different meanings. It is sometimes used to mean "generator function" and sometimes to mean "generator-iterator". The PEP255 and python mailing list are the two only places where I could read something that clearly distinguishes the two notions. A generator-function can be first seen as only a notational artefact to generate an iterator. The generated iterator is called a generator-iterator and it is, by its intimate nature, both an iterator and an iteratable. (This might very well be another reason why the iteration protocol baffles the distinction betweeen iterators and iteratables.) Typically (but not necessarily), the generator-iterator is such that the elements do not pre-exist but are "generated" as needed while iteration is going on. Here is an example of a generator-function and the equivalent iterator class. The function ""Ex5_GenList"" can be seen as just a notational shortcut to the equivalent Ex5_GenListIteratorClass class. """ def Ex5_GenList(): print "Enters Ex5_GenList" i = 0 while i < 10: print "Ex5_GenList yields -- Equivalent to have a it.next() return" yield i i += 1 print "Ex5_GenList exits, equivalent to : raise StopIteration" class Ex5_GenListIteratorClass: def __init__(self): print "Ex5_GenListIteratorClass.__init__" self._i = 0 def __iter__(self): print "Ex5_GenListIteratorClass.__iter__" return self def next(self): print "Ex5_GenListIteratorClass.next" nReturn = self._i if nReturn > 9: print "Ex5_GenListIteratorClass.next raises the StopIteration exception" raise StopIteration self._i += 1 return nReturn print print sEx5Doc print print "The type of Ex5_GenList() is : %s" % type(Ex5_GenList()) print "Although it is really meant generator-iterator" print "As you can see, calling Ex5_GenList() didn't enter the function since our" print "trace \"Enters Ex5_GenList\" had not been printed." print "It only produced the generator-iterator." print print "for n in Ex5_GenList(): gives:" print for n in Ex5_GenList(): print n, sys.stdout.flush() print print "The type of Ex5_GenList is : %s" % type(Ex5_GenList) print "Although it is really meant generator-function" print print "for n in Ex5_GenList: gives:" print try: for n in Ex5_GenList: print n, sys.stdout.flush() except: traceback.print_exc() print print "The type of Ex5_GenList().__iter__() is : %s" % type(Ex5_GenList().__iter__()) print print "for n in Ex5_GenList().__iter__(): gives:" print for n in Ex5_GenList().__iter__(): print n, sys.stdout.flush() print print "it = Ex5_GenList()" print "for n in it(): gives:" print it = Ex5_GenList() try: for n in it(): print n, sys.stdout.flush() except: traceback.print_exc() print "The iterator produced by Ex5_GenList() is obviously not itself callable" print print "for n in Ex5_GenListIteratorClass(): gives:" print for n in Ex5_GenListIteratorClass(): print n, sys.stdout.flush() sEx6Doc = \ """ ================================================================================ Example 6 ================================================================================ After having distinguished iteratable from iterator and generator-function from generator-iterator, and having seen how generator-iterator makes iterable and iterator undistinguishables, we are now ready for some real work. There is now a real vocabulary problem in the Python community but I think that it is now clear enough. Our first pseudo-real-world example is to produce the Fibonacci sequence using a simple generator. The Ex6_Fibonacci() is a bit special as it never stops. This is one way to implement something similar to infinite lists in python. """ def Ex6_Fibonacci(): a = 1 b = 1 yield a yield b while True: bb = a + b a = b b = bb yield bb print print sEx6Doc print it = Ex6_Fibonacci() for i in xrange(0,10): print it.next() sEx7Doc = \ """ ================================================================================ Example 7 ================================================================================ A beautiful algorithm to produce the fibonacci sequence goes by noticing that : fib(i) = 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, ... fib(i+1) = 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, ... fib(i)+fib(i+1) = 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, ... Generators now enable us to ""run after our tail"". We will exploit the Fibonacci sequence property shown above by defining a function that produces and returns a "list" while the production algorithm supposes the list as having been already produced by recursively calling itself. In order to be able to do this, we need to initiate the processus by explicitly giving the first values. We also need that the list must be produced "lazily" ; i.e. the values must be produced only as needed ... well needed to produce the next element in the ""list"" in our case. Here is a first try that will almost work. """ ## Without the traces: ## ## def Ex7_Fibonacci(): ## yield 1 ## yield 1 ## itTail = Ex7_Fibonacci() ## itTail.next() # Skip the first one ## for n in (head + tail for (head, tail) in izip(Ex7_Fibonacci(), itTail)): ## yield n N_STACK_DEPTH = 0 def Ex7_Fibonacci(): global N_STACK_DEPTH N_STACK_DEPTH += 1 nStackDepth = N_STACK_DEPTH print "[%d] Entering Ex7_Fibonacci" % nStackDepth print "[%d] Yielding 1" % nStackDepth yield 1 print "[%d] Yielding 1" % nStackDepth yield 1 itTail = Ex7_Fibonacci() itTail.next() # Skip the first one for n in (head + tail for (head, tail) in izip(Ex7_Fibonacci(), itTail)): print "[%d] Yielding %d" % (nStackDepth, n) yield n print print sEx7Doc print print list(islice(Ex7_Fibonacci(), 5)) sEx8Doc = \ """ ================================================================================ Example 8 ================================================================================ Running after your tail with itertools.tee Example 7 shows that although the idea is beautiful, the implementation is very inefficient. To work efficiently, the beginning of the list must not be recomputed over and over again. This is ensured in most FP languages as a built-in feature. In python, we have to explicitly maintain a list of already computed results and abandon genuine recursivity. The "tee" function does just what we want. It internally keeps a generated result for as long as it has not been "consumed" from all of the duplicated iterators, whereupon it is deleted. You can therefore print the Fibonacci sequence during hours without increasing memory usage, or very little. The beauty of it is that recursive running after their tail FP algorithms are quite straightforwardly expressed with this Python idiom. """ def Ex8_Fibonacci(): print "Entering Ex8_Fibonacci" def _Ex8_Fibonacci(): print "Entering _Ex8_Fibonacci" yield 1 yield 1 fibTail.next() # Skip the first one for n in (head + tail for (head, tail) in izip(fibHead, fibTail)): yield n fibHead, fibTail, fibRes = tee(_Ex8_Fibonacci(), 3) return fibRes print print sEx8Doc print print list(islice(Ex8_Fibonacci(), 5)) -- http://mail.python.org/mailman/listinfo/python-list