On Feb 21, 6:47 pm, Stefan Behnel <stefan...@behnel.de> wrote: > intellimi...@gmail.com wrote: > > I wrote a script to process textual data and extract phrases from > > them, storing these phrases in a dictionary. It encounters a > > MemoryError when there are about 11.18M keys in the dictionary, and > > the size is about 1.5GB. > > [...] > > I have 1GB of pysical memory and 3GB in pagefile. Is there a limit to > > the size or number of entries that a single dictionary can possess? By > > searching on the web I can't find a clue why this problem occurs. > > Python dicts are only limited by what your OS returns as free memory. > However, when a dict grows, it needs to resize, which means that it has to > create a bigger copy of itself and redistribute the keys. For a dict that > is already 1.5GB big, this can temporarily eat a lot more memory than you > have, at least more than two times as much as the size of the dict itself. > > You may be better served with one of the dbm databases that come with > Python. They live on-disk but do the usual in-memory caching. They'll > likely perform a lot better than your OS level swap file. > > Stefan
Ummm, I didn't know about the dbm databases. It seems there are many different modules for this kind of tasks: gdbm, berkeley db, cdb, etc. I'm needing to implement a constant hashtable with a large number of keys, but only a small fraction of them will be accessed frequently, the read speed is crucial. It would be ideal if the implementation caches all the frequently used key/value pairs in memory. Which module should I use? And is there a way to specify the amount of memory it uses for caching? BTW, the target platform is Linux. Thank you. -- http://mail.python.org/mailman/listinfo/python-list