Yi Xing:

> Since different lines have different size, I 
> cannot use seek(). So I am thinking of building an index for the file 
> for fast access. Can anybody give me some tips on how to do this in 
> Python?

    It depends on the size of the files and the amount of memory and 
disk you may use. First suggestion would be an in-memory array.array of 
64 bit integers made from 2 'I' entries with each 64 bit integer 
pointing to the start of a set of n lines. Then to find a particular 
line number p you seek to a[p/n] and then read over p%n lines. The 
factor 'n' is adjusted to fit your data into memory. If this uses too 
much memory or scanning the file to build the index each time uses too 
much time then you can use an index file with the same layout instead.

    Neil
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to