able to bring up any record out of 10 million or so on the screen in a
couple of seconds (not using Python, but that should not make much difference, maybe even some things in Python would make it work better.)
Even though my files were text, I read them as random-access binary files. You need to be able to dive in at an arbitrary point in the file, read a chunk of data, split it up into lines, discarding any partial lines at the beginning and end, pull out the keys and see where you are. Even with a gigabyte of file, if you are reading a decent size chunk, you can binary search down to the spot you want in 15-20 tries or so. That's the first time, but after that you've got a better idea where to look. Use a dictionary to save the information from each chunk to give you an index to get a headstart on the next search. If you can keep 10k to 100k entries in your index, you can do 1000's of searches or so before you even have to worry about having too many index entries.
I did learn that on 32-bit hardware, doing a binary search of a file over a gigabyte will fail if you calculate the next place to look as (a+b)/2, because a+b can be more than 2GB and overflow. You gotta do
(a + (b-a)/2)
Al -- http://mail.python.org/mailman/listinfo/python-list