On 15 Jan 2006 16:44:24 -0800, Paul Rubin <"http://phr.cx"@nospam.invalid> wrote: >I find pretty often that I want to loop through characters in a file: > > while True: > c = f.read(1) > if not c: break > ... > >or sometimes of some other blocksize instead of 1. It would sure >be easier to say something like: > > for c in f.iterbytes(): ... > >or > > for c in f.iterbytes(blocksize): ... > >this isn't anything terribly advanced but just seems like a matter of >having the built-in types keep up with language features. The current >built-in iterator (for line in file: ...) is useful for text files but >can potentially read strings of unbounded size, so it's inadvisable for >arbitrary files. > >Does anyone else like this idea?
It's a pretty useful thing to do, but the edge-cases are somewhat complex. When I just want the dumb version, I tend to write this: for chunk in iter(lambda: f.read(blocksize), ''): ... Which is only very slightly longer than your version. I would like it even more if iter() had been written with the impending doom of lambda in mind, so that this would work: for chunk in iter('', f.read, blocksize): ... But it's a bit late now. Anyhow, here are some questions about your iterbytes(): * Would it guarantee the chunks returned were read using a single read? If blocksize were a multiple of the filesystem block size, would it guarantee reads on block-boundaries (where possible)? * How would it handle EOF? Would it stop iterating immediately after the first short read or would it wait for an empty return? * What would the buffering behavior be? Could one interleave calls to .next() on whatever iterbytes() returns with calls to .read() on the file? Jean-Paul -- http://mail.python.org/mailman/listinfo/python-list