On Aug 27, 10:45 am, Roy Smith <r...@panix.com> wrote: > In article <4e592852$0$29965$c3e8da3$54964...@news.astraweb.com>, > Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > > > open("file.txt") # opens the file > > .read() # reads the contents of the file > > .split("\n\n") # splits the text on double-newlines. > > The biggest problem with this code is that read() slurps the entire file > into a string. That's fine for moderately sized files, but will fail > (or at least be grossly inefficient) for very large files. > > It's always annoyed me a little that while it's easy to iterate over the > lines of a file, it's more complicated to iterate over a file character > by character. You could write your own generator to do that: > > for c in getchar(open("file.txt")): > whatever > > def getchar(f): > for line in f: > for c in line: > yield c > > but that's annoyingly verbose (and probably not hugely efficient).
read() takes an optional size parameter; so f.read(1) is another option... > > Of course, the next problem for the specific problem at hand is that > even with an iterator over the characters of a file, split() only works > on strings. It would be nice to have a version of split which took an > iterable and returned an iterator over the split components. Maybe > there is such a thing and I'm just missing it? I don't know if there is such a thing; but for the OP's problem you could read the file in chunks, e.g.: def readgroup(f, delim, buffsize=8192): tail='' while True: s = f.read(buffsize) if not s: yield tail break groups = (tail + s).split(delim) tail = groups[-1] for group in groups[:-1]: yield group for group in readgroup(open('file.txt'), '\n\n'): # do something Cheers - Chas -- http://mail.python.org/mailman/listinfo/python-list