I have a need to read to an arbitrary delimiter, which might be any of a (small) set of characters. For the sake of the exercise, lets say it is either ! or ? (for example).
I want to read from files reasonably efficiently. I don't mind if there is a little overhead, but my first attempt is 100 times slower than the built-in "read to the end of the line" method. Here is the function I came up with: # Read a chunk of bytes/characters from an open file. def chunkiter(f, delim): buffer = [] b = f.read(1) while b: buffer.append(b) if b in delim: yield ''.join(buffer) buffer = [] b = f.read(1) if buffer: yield ''.join(buffer) And here is some test code showing how slow it is: # Create a test file. FILENAME = '/tmp/foo' s = """\ abcdefghijklmnopqrstuvwxyz! abcdefghijklmnopqrstuvwxyz? """ * 500 with open(FILENAME, 'w') as f: f.write(s) # Run some timing tests, comparing to reading lines from a file. def readlines(f): f.seek(0) for line in f: pass def readchunks(f): f.seek(0) for chunk in chunkiter(f, '!?'): pass from timeit import Timer SETUP = 'from __main__ import readlines, readchunks, FILENAME; ' SETUP += 'open(FILENAME)' t1 = Timer('readlines(f)', SETUP) t2 = Timer('readchunks(f)', SETUP) # Time them. x = t1.repeat(number=10) # Ignore the first run, in case of caching issues. x = min(t1.repeat(number=1000, repeat=9)) y = t2.repeat(number=10) y = min(t2.repeat(number=1000, repeat=9)) print('reading lines:', x, 'reading chunks:', y) On my laptop, the results I get are: reading lines: 0.22584209218621254 reading chunks: 21.716224210336804 Is there a better way to read chunks from a file up to one of a set of arbitrary delimiters? Bonus for it working equally well with text and bytes. (You can assume that the delimiters will be no more than one byte, or character, each. E.g. "!" or "?", but never "!?" or "?!".) -- Steve -- https://mail.python.org/mailman/listinfo/python-list