I just had to write some programs that crunched a lot of large files, both text and binary. As I use iterators more I find myself wishing for some maybe-obvious enhancements:
1. File iterator for blocks of chars: f = open('foo') for block in f.iterchars(n=1024): ... iterates through 1024-character blocks from the file. The default iterator which loops through lines is not always a good choice since each line can use an unbounded amount of memory. Default n in the above should be 1 char. 2. wrapped file openers: There should be functions (either in itertools, builtins, the sys module, or whereever) that open a file, expose one of the above iterators, then close the file, i.e. def file_lines(filename): with f as open(filename): for line in f: yield line so you can say for line in file_lines(filename): crunch(line) The current bogus idiom is to say "for line in open(filename)" but that does not promise to close the file once the file is exhausted (part of the motivation of the new "with" statement). There should similarly be "file_chars" which uses the n-chars iterator instead of the line iterator. 3. itertools.ichain: yields the contents of each of a sequence of iterators, i.e.: def ichain(seq): for s in seq: for t in s: yield t this is different from itertools.chain because it lazy-evaluates its input sequence. Example application: all_filenames = ['file1', 'file2', 'file3'] # loop through all the files crunching all lines in each one for line in (ichain(file_lines(x) for x in all_filenames)): crunch(x) 4. functools enhancements (Haskell-inspired): Let f be a function with 2 inputs. Then: a) def flip(f): return lambda x,y: f(y,x) b) def lsect(x,f): return partial(f,x) c) def rsect(f,x): return partial(flip(f), x) lsect and rsect allow making what Haskell calls "sections". Example: # sequence of all squares less than 100 from operator import lt s100 = takewhile(rsect(lt, 100), (x*x for x in count())) -- http://mail.python.org/mailman/listinfo/python-list