On Tue, Feb 1, 2011 at 9:12 AM, Alan Meyer <amey...@yahoo.com> wrote: > On 01/26/2011 04:22 PM, MRAB wrote: >> >> On 26/01/2011 10:59, Xavier Heruacles wrote: >>> >>> I have do some log processing which is usually huge. The length of each >>> line is variable. How can I get the last line?? Don't tell me to use >>> readlines or something like linecache... >>> >> Seek to somewhere near the end and then read use readlines(). If you >> get fewer than 2 lines then you can't be sure that you have the entire >> last line, so seek a little farther from the end and try again. > > I think this has got to be the most efficient solution. > > You might get the source code for the open source UNIX utility "tail" and > see how they do it. Â It seems to work with equal speed no matter how large > the file is and I suspect it uses MRAB's solution, but because it's written > in C, it probably examines each character directly rather than calling a > library routine like readlines. >
How about mmapping the file and using rfind? def mapper(filename): with open(filename) as f: mapping = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) endIdx = mapping.rfind('\n') startIdx = mapping.rfind('\n', 0, endIdx) return mapping[startIdx + 1:endIdx] def seeker(filename): offset = -10 with open(filename, 'rb') as f: while True: f.seek(offset, os.SEEK_END) lines = f.readlines() if len(lines) >= 2: return lines[-1][:-1] offset *= 2 In [1]: import timeit In [2]: timeit.timeit('finders.seeker("the-file")', 'import finders') Out[2]: 32.216405868530273 In [3]: timeit.timeit('finders.mapper("the-file")', 'import finders') Out[3]: 16.805877208709717 the-file is a 120M file with ~500k lines. Both functions assume the last line has a trailing newline. It's easy to correct if that's not the case. I think mmap works similarly on Windows, but I've never tried there. -- regards, kushal -- http://mail.python.org/mailman/listinfo/python-list