Yi Xing: > Since different lines have different size, I > cannot use seek(). So I am thinking of building an index for the file > for fast access. Can anybody give me some tips on how to do this in > Python?
It depends on the size of the files and the amount of memory and disk you may use. First suggestion would be an in-memory array.array of 64 bit integers made from 2 'I' entries with each 64 bit integer pointing to the start of a set of n lines. Then to find a particular line number p you seek to a[p/n] and then read over p%n lines. The factor 'n' is adjusted to fit your data into memory. If this uses too much memory or scanning the file to build the index each time uses too much time then you can use an index file with the same layout instead. Neil -- http://mail.python.org/mailman/listinfo/python-list