Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-12 Thread John J. Lee
"Chris Mellon" <[EMAIL PROTECTED]> writes: [...] > The minimum bounds for a line is at least one byte (the newline) and > maybe more, depending on your data. You can seek() forward the minimum > amount of bytes that (1 billion -1) lines will consume and save > yourself some wasted IO. But how do y

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Terry Reedy
"Marc 'BlackJack' Rintsch" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] | On Wed, 08 Aug 2007 09:54:26 +0200, Méta-MCI \(MVP\) wrote: | | > Create a "index" (a file with 3,453,299,000 tuples : | > line_number + start_byte) ; this file has fix-length lines. | > slow, OK, but once. |

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Chris Mellon
On 8/8/07, Steve Holden <[EMAIL PROTECTED]> wrote: > Chris Mellon wrote: > > On 8/8/07, Ben Finney <[EMAIL PROTECTED]> wrote: > >> Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > >> > >>> On Aug 8, 2:35 am, Paul Rubin wrote: > Sullivan WxPyQtKinter <[EMAIL PROTEC

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Steve Holden
Chris Mellon wrote: > On 8/8/07, Ben Finney <[EMAIL PROTECTED]> wrote: >> Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: >> >>> On Aug 8, 2:35 am, Paul Rubin wrote: Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > This program: > for i in range(1000

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Chris Mellon
On 8/8/07, Ben Finney <[EMAIL PROTECTED]> wrote: > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > > > On Aug 8, 2:35 am, Paul Rubin wrote: > > > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > > > > This program: > > > > for i in range(10): > > > >

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Bruno Desthuilliers
Ant a écrit : > On Aug 8, 11:10 am, Bruno Desthuilliers [EMAIL PROTECTED]> wrote: >> Jay Loden a écrit : >> (snip) >> >>> If we just want to iterate through the file one line at a time, why not >>> just: >>> count = 0 >>> handle = open('hugelogfile.txt') >>> for line in handle.xreadlines(): >>>

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Bjoern Schliessmann
Peter Otten wrote: > n = 10**9 - 1 > assert n < sys.maxint > f = open(filename) > wanted_line = itertools.islice(f, n, None).next() > > should do slightly better than your implementation. It will do vastly better, at least in memory usage terms, because there is no memory eating range call. Reg

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Ant
On Aug 8, 11:10 am, Bruno Desthuilliers wrote: > Jay Loden a écrit : > (snip) > > > If we just want to iterate through the file one line at a time, why not > > just: > > > count = 0 > > handle = open('hugelogfile.txt') > > for line in handle.xreadlines(): > > count = count + 1 > > if coun

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Ben Finney
Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > On Aug 8, 2:35 am, Paul Rubin wrote: > > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > > > This program: > > > for i in range(10): > > > f.readline() > > > is absolutely every slow > > > > There

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Marc 'BlackJack' Rintsch
On Wed, 08 Aug 2007 09:54:26 +0200, Méta-MCI \(MVP\) wrote: > Create a "index" (a file with 3,453,299,000 tuples : > line_number + start_byte) ; this file has fix-length lines. > slow, OK, but once. Why storing the line number? The first start offset is for the first line, the second start offs

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Bruno Desthuilliers
Jay Loden a écrit : (snip) > If we just want to iterate through the file one line at a time, why not just: > > count = 0 > handle = open('hugelogfile.txt') > for line in handle.xreadlines(): > count = count + 1 > if count == '10': > #do something for count, line in enumera

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread M�ta-MCI (MVP)
Hi! Create a "index" (a file with 3,453,299,000 tuples : line_number + start_byte) ; this file has fix-length lines. slow, OK, but once. Then, for every consult/read a specific line: - direct acces read on index - seek at the fisrt byte of the line desired @+ Michel Claveau -- http://

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Jay Loden
Paul Rubin wrote: > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: >> This program: >> for i in range(10): >> f.readline() >> is absolutely every slow > > There are two problems: > > 1) range(10) builds a list of a billion elements in memory, > which is many gig

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-08 Thread Peter Otten
Sullivan WxPyQtKinter wrote: > I have a huge log file which contains 3,453,299,000 lines with > different lengths. It is not possible to calculate the absolute > position of the beginning of the one billionth line. Are there > efficient way to seek to the beginning of that line in python? > > Thi

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-07 Thread Sullivan WxPyQtKinter
On Aug 8, 2:35 am, Paul Rubin wrote: > Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > > This program: > > for i in range(10): > > f.readline() > > is absolutely every slow > > There are two problems: > > 1) range(10) builds a list of a bill

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-07 Thread Paul Rubin
Sullivan WxPyQtKinter <[EMAIL PROTECTED]> writes: > This program: > for i in range(10): > f.readline() > is absolutely every slow There are two problems: 1) range(10) builds a list of a billion elements in memory, which is many gigabytes and probably thrashing your

Re: Seek the one billionth line in a file containing 3 billion lines.

2007-08-07 Thread Evan Klitzke
On 8/7/07, Sullivan WxPyQtKinter <[EMAIL PROTECTED]> wrote: > I have a huge log file which contains 3,453,299,000 lines with > different lengths. It is not possible to calculate the absolute > position of the beginning of the one billionth line. Are there > efficient way to seek to the beginning of