On 8/8/07, Steve Holden <[EMAIL PROTECTED]> wrote: > Chris Mellon wrote: > > On 8/8/07, Ben Finney <[EMAIL PROTECTED]> wrote: > >> Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > >> > >>> On Aug 8, 2:35 am, Paul Rubin <http://[EMAIL PROTECTED]> wrote: > >>>> Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > >>>>> This program: > >>>>> for i in range(1000000000): > >>>>> f.readline() > >>>>> is absolutely every slow.... > >>>> There are two problems: > >>>> > >>>> 1) range(1000000000) builds a list of a billion elements in memory > >> [...] > >>>> 2) f.readline() reads an entire line of input > >> [...] > >>> Thank you for pointing out these two problem. I wrote this program > >>> just to say that how inefficient it is to use a seemingly NATIVE way > >>> to seek a such a big file. No other intention........ > >> The native way isn't iterating over 'range(hugenum)', it's to use an > >> iterator. Python file objects are iterable, only reading eaach line as > >> needed and not creating a companion list. > >> > >> logfile = open("foo.log", 'r') > >> for line in logfile: > >> do_stuff(line) > >> > >> This at least avoids the 'range' issue. > >> > >> To know when we've reached a particular line, use 'enumerate' to > >> number each item as it comes out of the iterator. > >> > >> logfile = open("foo.log", 'r') > >> target_line_num = 10**9 > >> for (line_num, line) in enumerate(file): > >> if line_num < target_line_num: > >> continue > >> else: > >> do_stuff(line) > >> break > >> > >> As for reading each line: that's unavoidable if you want a specific > >> line from a stream of variable-length lines. > >> > > > > The minimum bounds for a line is at least one byte (the newline) and > > maybe more, depending on your data. You can seek() forward the minimum > > amount of bytes that (1 billion -1) lines will consume and save > > yourself some wasted IO. > > Except that you will have to count the number of lines in that first > billion characters in order to determine when to stop. >
True. Perhaps you can tell from the data itself what line you want. -- http://mail.python.org/mailman/listinfo/python-list