On May 2, 11:48 pm, Paul Rubin <no.em...@nospam.invalid> wrote: > Paul Rubin <no.em...@nospam.invalid> writes: > >looking at the spec more closely, there are 256 hash tables.. ... > > You know, there is a much simpler way to do this, if you can afford to > use a few hundred MB of memory and you don't mind some load time when > the program first starts. Just dump all the data sequentially into a > file. Then scan through the file, building up a Python dictionary > mapping data keys to byte offsets in the file (this is a few hundred MB > if you have 3M keys). Then dump the dictionary as a Python pickle and > read it back in when you start the program. > > You may want to turn off the cyclic garbage collector when building or > loading the dictionary, as it badly can slow down the construction of > big lists and maybe dicts (I'm not sure of the latter).
I'm starting to lean toward the file-offset/seek approach. I am writing some benchmarks on it, comparing it to a more file-system based approach like I mentioned in my original post. I'll report back when I get results, but it's already way past my bedtime for tonight. Thanks for all your help and suggestions. -- http://mail.python.org/mailman/listinfo/python-list