But that is exactly the behaviour of python iterator, I don't see what is broken.
izip/zip just read from the respectives streams and give back a tuple, if it can get one from each, otherwise stop. And because python iterator can only go in one direction, those consumed do lose in the zip/izip calls. I think you need to use map(None,...) which would not drop anything, just None filled. Though you don't have a relatively lazy version as imap(None,...) doesn't behave like map but a bit like zip. [EMAIL PROTECTED] wrote: > The code below should be pretty self-explanatory. > I want to read two files in parallel, so that I > can print corresponding lines from each, side by > side. itertools.izip() seems the obvious way > to do this. > > izip() will stop interating when it reaches the > end of the shortest file. I don't know how to > tell which file was exhausted so I just try printing > them both. The exhausted one will generate a > StopInteration, the other will continue to be > iterable. > > The problem is that sometimes, depending on which > file is the shorter, a line ends up missing, > appearing neither in the izip() output, or in > the subsequent direct file iteration. I would > guess that it was in izip's buffer when izip > terminates due to the exception on the other file. > > This behavior seems plain out broken, especially > because it is dependent on order of izip's > arguments, and not documented anywhere I saw. > It makes using izip() for iterating files in > parallel essentially useless (unless you are > lucky enough to have files of the same length). > > Also, it seems to me that this is likely a problem > with any iterables with different lengths. > I am hoping I am missing something... > > #--------------------------------------------------------- > # Task: print contents of file1 in column 1, and > # contents of file2 in column two. iterators and > # izip() are the "obvious" way to do it. > > from itertools import izip > import cStringIO, pdb > > def prt_files (file1, file2): > > for line1, line2 in izip (file1, file2): > print line1.rstrip(), "\t", line2.rstrip() > > try: > for line1 in file1: > print line1, > except StopIteration: pass > > try: > for line2 in file2: > print "\t",line2, > except StopIteration: pass > > if __name__ == "__main__": > # Use StringIO to simulate files. Real files > # show the same behavior. > f = cStringIO.StringIO > > print "Two files with same number of lines work ok." > prt_files (f("abc\nde\nfgh\n"), f("xyz\nwv\nstu\n")) > > print "\nFirst file shorter is also ok." > prt_files (f("abc\nde\n"), f("xyz\nwv\nstu\n")) > > print "\nSecond file shorter is a problem." > prt_files (f("abc\nde\nfgh\n"), f("xyz\nwv\n")) > print "What happened to \"fgh\" line that should be in column > 1?" > > print "\nBut only a problem for one line." > prt_files (f("abc\nde\nfgh\nijk\nlm\n"), f("xyz\nwv\n")) > print "The line \"fgh\" is still missing, but following\n" \ > "line(s) are ok! Looks like izip() ate a line." -- http://mail.python.org/mailman/listinfo/python-list