[EMAIL PROTECTED] wrote: > I am manipulating lots of log files (about 500,000 files and about 30Gb > in total) to get them into a little SQL db. Part of this process is > "normalisation" and creating tables of common data. I am creating > dictionaries for these in a simple {value,key} form. > > In terms of memory and performance what are the reasonable limits for a > dictionary with a key and a 16 character string? eg; if I read in one > of my tables from disk into a dictionary, what sizing is comfortable? > 100,000 entries? 1,000,000 entries? Lookup times and memory > requirements are my main worries.
you don't specify what a "key" is, but the following piece of code took less than a minute to write, ran in roughly two seconds on my machine, and results in a CPython process that uses about 80 megabytes of memory. >>> d = {} >>> for i in range(1000000): ... k = str(i).zfill(16) ... d[k] = k ... >>> k '0000000000999999' since dictionaries use hash tables, the lookup time is usually independent of the dictionary size. also see: http://www.effbot.org/pyfaq/how-are-dictionaries-implemented.htm </F> -- http://mail.python.org/mailman/listinfo/python-list