Re: How to read from a file to an arbitrary delimiter efficiently?

BartC Sat, 27 Feb 2016 12:06:58 -0800

On 27/02/2016 16:35, BartC wrote:

On 25/02/2016 06:50, Steven D'Aprano wrote:

I have a need to read to an arbitrary delimiter, which might be any of a
(small) set of characters. For the sake of the exercise, lets say it is
either ! or ? (for example).

However those aren't the main reasons for the poor speed. The limiting
factor here is reading one byte at a time. Just a loop like this:

    while f.read(1):
       pass

without doing anything else, seems to take most of the time. (3.6
seconds, compared with 5.6 seconds of your readchunks() on a 6MB version
of your test file, on Python 2.7. readlines() took about 0.2 seconds.)

Any faster solutions would need to read more than one byte at a time.

I've done some more test using Python 3.4, with the same 200,000 line6MB test file:


0.25 seconds       Scan the file with 'for line in f'
2.25 seconds       Scan the file with your readlines() routine
4.0  seconds       Scan the file with your readchunks() routine
0.65 seconds       Scan the file with using a buffer

This latter test uses a 64-byte buffer, reading not more than an extra63 bytes, but resetting the file position to just past the end of ofeach identified chunk so that any subsequent read works as expected.

This test (the code is too untidy to post) only checks for two specificdelimiters (not an arbitrary string fill of them). (It also counts EOFas a valid delimiter so counts one more chunk.)

Increasing the buffer size doesn't help, and beyond 256 bytes slowedthings down (for this input) as it spends too long rereading data.


--
Bartc
--
https://mail.python.org/mailman/listinfo/python-list

Re: How to read from a file to an arbitrary delimiter efficiently?

Reply via email to