On Fri, 01 Aug 2008 00:46:09 -0700, Simon Strobl wrote: > Hello, > > I tried to load a 6.8G large dictionary on a server that has 128G of > memory. I got a memory error. I used Python 2.5.2. How can I load my > data?
How do you know the dictionary takes 6.8G? I'm going to guess an answer to my own question. In a later post, Simon wrote: [quote] I had a file bigrams.py with a content like below: bigrams = { ", djy" : 75 , ", djz" : 57 , ", djzoom" : 165 , ", dk" : 28893 , ", dk.au" : 854 , ", dk.b." : 3668 , ... } [end quote] I'm guessing that the file is 6.8G of *text*. How much memory will it take to import that? I don't know, but probably a lot more than 6.8G. The compiler has to read the whole file in one giant piece, analyze it, create all the string and int objects, and only then can it create the dict. By my back-of-the-envelope calculations, the pointers alone will require about 5GB, nevermind the objects they point to. I suggest trying to store your data as data, not as Python code. Create a text file "bigrams.txt" with one key/value per line, like this: djy : 75 djz : 57 djzoom : 165 dk : 28893 ... Then import it like such: bigrams = {} for line in open('bigrams.txt', 'r'): key, value = line.split(':') bigrams[key.strip()] = int(value.strip()) This will be slower, but because it only needs to read the data one line at a time, it might succeed where trying to slurp all 6.8G in one piece will fail. -- Steven -- http://mail.python.org/mailman/listinfo/python-list