En Wed, 30 Jul 2008 21:29:39 -0300, <[EMAIL PROTECTED]> escribi�:
Are there any techniques I can use to strip a dictionary data structure down to the smallest memory overhead possible? I'm working on a project where my available RAM is limited to 2G and I would like to use very large dictionaries vs. a traditional database. Background: I'm trying to identify duplicate records in very large text based transaction logs. I'm detecting duplicate records by creating a SHA1 checksum of each record and using this checksum as a dictionary key. This works great except for several files whose size is such that their associated checksum dictionaries are too big for my workstation's 2G of RAM.
You could use a different hash algorithm yielding a smaller value (crc32, by example, fits on an integer). At the expense of having more collisions, and more processing time to check those possible duplicates.
-- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list