On Monday, 12 March 2012 20:31:35 UTC, MRAB wrote: > On 12/03/2012 19:39, Virgil Stokes wrote: > > I have a rather large ASCII file that is structured as follows > > > > header line > > 9 nonblank lines with alphanumeric data > > header line > > 9 nonblank lines with alphanumeric data > > ... > > ... > > ... > > header line > > 9 nonblank lines with alphanumeric data > > EOF > > > > where, a data set contains 10 lines (header + 9 nonblank) and there can > > be several thousand > > data sets in a single file. In addition,*each header has a* *unique ID > > code*. > > > > Is there a fast method for the retrieval of a data set from this large > > file given its ID code? > > > Probably the best solution is to put it into a database. Have a look at > the sqlite3 module. > > Alternatively, you could scan the file, recording the ID and the file > offset in a dict so that, given an ID, you can seek directly to that > file position.
I would have a look at either bsddb, Tokyo (or Kyoto) Cabinet or hamsterdb. If it's really going to get large and needs a full blown server, maybe MongoDB/redis/hadoop... -- http://mail.python.org/mailman/listinfo/python-list